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338 


339 Abstract 

340 Biliary excretion is one of the main elimination pathways for drugs and/or their 

341 metabolites. Therefore, an insight into the structural profile of cholephilic 

342 compounds through accurate modelling of the biliary excretion is important for the 

343 estimation of clinical pharmacokinetics in early stages of drug discovery. The aim 

344 of this project was to develop Quantitative Structure-Activity Relationships 

345 (QSAR) as computational tools for the estimation of biliary excretion. In addition, 

346 the structural requirements for biliary excretion were investigated in relation to the 

347 structural requirements for binding to uptake and efflux transporter proteins that are 

348 involved in hepatobiliary elimination. 

349 The study used three datasets; 1. percentage of dose excreted intact into bile in rat 

350 for 217 compounds, 2. P-gp inhibition constants for 219 compound, 3. percentage 

351 inhibition of OATP transporters, OATP1B1, OATP1B3 and OATP2B1. Statistical 

352 techniques were stepwise regression analysis, Classification and Regression Trees 

353 (C&RT), Chi-square Automatic Interaction Detector (CHAID), Boosted trees (BT), 

354 Random Forest (RF) and Multivariate Adaptive Regression Splines (MARS) 

355 models. 

356 The study resulted in reasonable QSARs for the prediction of biliary excretion, P- 

357 gp binding constants and percentage inhibition of OATPs, along with QSARs 

358 incorporating predicted P-gp and OATP inhibition values for the prediction of 

359 biliary excretion. Simple regression tree models were of similar accuracy to the 

360 boosted trees model in the estimation of the percentage of bile excretion of 

361 compounds. Molecular descriptors selected by these models indicated a higher 

362 biliary excretion for relatively hydrophilic compounds especially if they have 

363 acid/base dissociation, and a large molecular size above 348 Da. 

364 The major role of OATPs in biliary excretion was indicated using interactive 

365 decision tree models with OATP IB 1 binding being the most successful predictor of 

366 biliary excretion amongst the three OATP subfamilies. In contrast, predicted P-gp 

367 binding parameters were not successful in the prediction of biliary excretion. This 

368 may be due to problems in extrapolating the in vitro P-gp binding data to the in vivo 

369 situation, or due to the difference in the chemical spaces of the P-gp and biliary 

370 excretion datasets which may lead to the compounds in biliary excretion dataset to 

371 fall outside the applicability domain of the P-gp models. 

372 
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373 1 . Introduction 

374 

375 1.1. Drug Discovery and Development 

376 Discovery and development of a drug is a very expensive process (Djulbegovic et 

377 al., 2014). Toxicity, poor efficacy and poor bioavailability are the main reasons for 

378 failure during discovery, development and registration of drug candidates (Gad, 

379 2005). Early identification of poor candidates is very essential for reducing the cost 

380 and the resources spent on drug discovery and development. For most drugs, 

381 discovery and development could be a remarkably long process. For example, from 

382 initial stage to approval of Food and Drug Administration (FDA) for taxol which is 

383 a chemotherapeutic drug, was nearly 30 years (Rowinsky et al., 1990). There has 

384 been a steady decline in the number of drugs approved for marketing by regulatory 

385 agencies since the 1960s, despite the advancements in drug discovery technology, 

386 and the increasing investments of the phannaceutical companies. The trend can be 

387 seen from 70-100 drugs introduced in the 1960s, and 60-70 drugs in the 1970s, to 

388 about 50 in the 1980s, and less than 40 in the 1990s and after (Hillisch and 

389 Hilgenfeld, 2003) (Figure 1.1). 
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Figure 1.1. Number of new drugs introduced from the 1960s to 1990s (Adapted 
from Hillisch and Hilgenfeld, 2003) 

Some of the factors that are considered to be responsible for this decline are the 
stricter control of the process by regulatory agencies such as the FDA to ensure the 
safety of compounds before approval. This leads to high attrition rates and a 
prolonged duration of the drug development process (Hillisch and Hilgenfeld, 
2003). The major cause for decline of new molecular entities (NMEs) or failures 
recorded in drug development was attributed to poor pharmacokinetics (39%) and 
animal toxicity (11%) (Waterbeemd and Gifford, 2003; Rang, 2006). 

Drug candidates normally undergo prior investigation with selection of those 
compounds with optimal properties including physicochemical parameters 
(Lipinski et al., 1997). According to Kems and Di (2008) important properties in 
drug discovery can be classified in four groups: (1) Structural properties, e.g. 
hydrogen bonding, lipophilicity, molecular weight (MW), pKa, polar surface area, 
shape and reactivity, (2) Physicochemical properties such as solubility, 
permeability and chemical stability, (3) Biochemical properties, such as metabolism 
and transport, (4) Pharmacokinetics and toxicity, e.g. clearance, half-life, 
bioavailability and LD 5 o. 
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409 Initial identification of drug candidates is based mainly on the ability of compounds 

410 to have a desired activity and selectivity against a target (e.g. inhibitory effect). 

411 Investigation of other properties is traditionally postponed to later stages of the 

412 development process, due in part to the success of phannaceutics research in 

413 achieving adequate absorption or bioavailability of drug molecules (Bleicher et al., 

414 2003). Recently, with the advent of modern technologies in drug discovery 

415 including in silico methods, to address the problem of high attrition rate, screening 

416 of potential drug candidates for their phannacokinetic and physicochemical 

417 properties is being introduced by the pharmaceutical industry much earlier during 

418 drug development (Rang, 2006). A much better approach which helps facilitate the 

419 success and approval of a drug molecule is the use of predictive tools in the design 

420 phase of the synthesis of compound libraries (Waterbeemd and Gifford, 2003). 

421 Nowadays, in vitro methods and statistical modeling are used extensively in the 

422 development of drugs. These methods allow the reduction in more expensive in 

423 vivo experiments. Model development in drug development is usually empirical or 

424 exploratory in nature. Models are developed using experimental data and then 

425 refined until a reasonable balance is obtained between overfitting and underfitting 

426 (Bonate, 2006). Computational modeling may be helpful in assay systems resulting 

427 in faster discovery of new potential drugs (Bronchud et al., 2008). 

428 

429 The prediction ability of ADME properties as well as the knowledge of the 

430 binding/modulating properties of drug molecules on membrane transporter proteins 

431 are important as they inherently contribute to the phannacokinetic properties. 

432 Transporters such as P-glycoproteins belong to the ATP-binding cassette 

433 superfamily of membrane transporters (Poongavanam et al., 2012). The FDA has 

434 urged that every new molecular entity should be routinely checked for a possible 

435 interaction with P-glycoproteins (FDA Guidelines, 2014). Thus, in lead 

436 optimisation process, early identification of membrane transport protein ligands, 

437 being substrates or inhibitors, is of utmost importance to improve the ADME 

438 profile of drug candidates (Bleicher et al, 2003; Di Pietro et ah, 2002). 

439 
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440 1.2. Pharmacokinetics 

441 Absorption, distribution, metabolism and excretion (ADME) are the main processes 

442 in biological disposition of a drug. Following drug administration, depending on the 

443 site of administration, drug concentration will increase in the blood, plasma and 

444 consequently in tissues due to the absorption process. This is followed by a decline 

445 in plasma concentration due to drug distribution into tissues and elimination. 

446 Pharmacokinetics (PK) is the study of the time course of drug concentration in the 

447 body. In addition to dosage regimen decisions, other applications of 

448 phannacokinetics studies include bioavailability measurements, effect of 

449 physiological and pathological conditions on drug distribution, elimination and 

450 absorption, dosage adjustment of drugs in disease states when necessary, 

451 correlation of pharmacological responses with administered doses, evaluation of 

452 drug interactions and finally clinical prediction using phannacokinetic parameters 

453 to individualize the drug dosing regimen (Jambhekar and Breen, 2009). In general, 

454 PK parameters of a drug result from its physicochemical and biochemical 

455 properties. These properties are detennined by the structure of the drug (Kems and 

456 Di, 2008). 

457 Absorption phase is the first pharmacokinetic process before the distribution and 

458 elimination. After a standard dosage of oral administration enters the gastric fluid, 

459 the drug is gradually released from the formulation and the absorption process starts 

460 (Rosenbaum, 2011). In this phase, the dissolved drug has the chance to pass 

461 through the GI membrane into the blood. Passive absorption is thought to be the 

462 main mechanism of absorption for most drugs. However, uptake transporters 

463 (carrier proteins) in intestinal epithelial membrane may be facilitating the 

464 absorption process. Besides, in the enterocyte membrane, drug absorption may be 

465 reduced if efflux transporters take the drug back into the lumen (Rosenbaum, 2011). 

466 Absorption of proteins and macromolecular drugs from the GI tract is hard due to 

467 their large size and, therefore, parenteral administration is the predominant route of 

468 drug delivery for these drugs (Pandit, 2007). Other routes of administration include 

469 the transdermal route, when drug is applied to the skin for systemic absorption 

470 through the skin, the respiratory route, in which drug is inhaled into the lungs and 

471 the main absorption happens in the alveoli, and the nasal route, where the nasal 
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472 mucosa with a good blood supply can absorb the drugs quickly depending on the 

473 duration of drug contact with the nasal mucosa (Pandit, 2007). 

474 Distribution is the next important phase in phannacokinetics that controls drug 

475 concentrations in the tissues and the observed pharmacological response. Drug 

476 distribution to peripheral tissues is dependent on four main factors: (1) the drug 

477 concentration; (2) the drug physicochemical properties; (3) the blood flow to the 

478 tissue; and (4) the affinity of drug for the tissue vs. the drug affinity to plasma 

479 proteins. Amongst these factors, physicochemical properties of drugs such as acid 

480 dissociation constant and molecular weight (MW) are some of the most influential 

481 factors in tissue distribution (Riviere, 2011). Apart from the above mentioned 

482 parameters, the rate of drug metabolism plays a key role in distribution, since 

483 readily metabolised compounds are less available for tissue distribution (Riviere, 

484 2011). Metabolism plays an essential function in the drug elimination. The rate of 

485 metabolism for drugs that are very rapidly or very slowly cleared can present 

486 problems in accurate control of the plasma levels, and, with persistent compounds 

487 of very long half-lives, the risk of toxicity can be considerable (Coleman, 2005). 

488 First-pass metabolism is a situation when a drug is metabolised prior to reaching 

489 systemic circulations. First-pass metabolism may happen in both the liver and the 

490 gut (Chesnokova et al., 2007). In general, the liver is the most important and 

491 sometimes the only site of metabolism. Extensive metabolism in one or more other 

492 tissues, such as the kidney, lung and gastrointestinal membrane is rarely observed 

493 (Tozer and Rowland, 2006). 

494 In addition to the metabolism, drug excretion by the kidneys and liver are the main 

495 routes of drug elimination. The kidney is the main organ of excretion, while several 

496 compounds are excreted in bile. The renal excretion is mainly by glomerular 

497 filtration (Rosenbaum, 2011). Drugs that are secreted into the bile finally pass into 

498 the intestine. In the intestine they may be re-absorbed; this process is kn own as the 

499 enterohepatic circulation. The route and the rate of a drug’s elimination has major 

500 consequences in terms of the pharmacokinetics, drug-drug interactions, and the 

501 pharmacotherapy in general. The elimination process has been discussed in a 

502 greater detail in section 1.3. 

503 
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504 1.3. Elimination of Drugs 

505 Drugs can be eliminated by metabolism or excretion. Excretion is the process that 

506 removes a drug from tissues and circulation (DiPiro et al., 2010). Therefore, 

507 excretion in theory could include discharge into the urine, faeces (via bile from the 

508 liver), exhaled air (via the lungs), or sweat (via the skin). However, for most drugs, 

509 the primary route of excretion is the renal excretion into the urine via the kidneys 

510 and/or the biliary excretion into the bile via the liver (Taft, 2009). Renal excretion 

511 is more common for the water-soluble molecules; hence, many polar drugs with 

512 low log P values are excreted unchanged directly into the urine. Lipophilic drugs 

513 may experience the process of tubular reabsorption and move from the urine (tubule 

514 of the nephron) into the peritubular capillaries, and consequently cannot be 

515 eliminated by renal excretion. For these drugs, hepatic clearance may be the main 

516 route of elimination. The primary purpose of hepatic metabolism is to create more 

517 hydrophilic molecules that will not be reabsorbed and, thus, can be excreted from 

518 the body in the urine or bile. Most drugs are lipophilic in nature and are eliminated 

519 by metabolism or biotransformation (Rosenbaum, 2011). Drug molecules that are 

520 larger (high molecular weight), and glucuronide and glutathione conjugates are 

521 more likely to be excreted via the liver into the bile. Compounds that are excreted 

522 into the bile end up in the intestines, where they may be eliminated by the faeces or 

523 reabsorbed (Taft, 2009). 

524 Clearance is a parameter that indicates the rate at which a drug is cleared from the 

525 body. It is defined as the volume of plasma from which all drug is removed in a 

526 given time presented in volume per time units (Stringer, 2006). This powerful 

527 parameter is used in phannacokinetics for the evaluation of the elimination, and for 

528 clinical applications. Clearance may be viewed as a factor of drug elimination rate 

529 (eq. 1.1): 

530 Rate of elimination = Cl. C Eq. 1.1 

531 Where C is the blood concentration (Tozer and Rowland, 2006). As we can see in 

532 Eq.1.1, clearance relates the rate of drug elimination to the concentration. Total 

533 clearance ( Cl) or total body clearance which is referred to as systemic clearance, is 
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534 sum of all the component clearances by different body organs (Rosenbaum, 2011) 

535 given by eq. 1.2. 

536 CIt = Cl R + CIh + Cl odier Eq. 1.2 

537 In eq. 1.2, CIt is the total body clearance, CIr is the renal clearance, Cl // is the 

538 hepatic clearance, and Cl ot h er indicates any other form of clearance. 

539 The compartmental models below show how we can calculate elimination in the 

540 body (Patric, 2006): 

541 ^ = —k ei . X Eq. 1.3 

542 Where X is the amount of drug in the body and t is the time after administration of 

543 dose and k e i shows the elimination rate constant. 

544 Integration from the above equation presents the next expression: 

545 X = X 0 . e kelt then, IogX=IogX 0 - Eq. 1.4 

546 Where Xo represents the initial amount of drug in the body. 

547 Alternatively, k e /, can be calculated with the help of other pharmacokinetic 

548 parameters (eq. 1.5): 

549 

550 k e , = Cl T /V d Eq. 1.5 

551 Where V d represents the apparent volume of distribution. 

552 Furthennore, total clearance and volume of distribution can be calculated from the 

553 following equations: 

554 Cl = —— and V d = ^ Eq. 1.6 

555 Here, C stands for the drug concentration in plasma. 
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556 1.3.1. Renal Excretion 

557 Renal excretion is a very vital process by which the products of metabolism and 

558 waste metabolites are cleared from the organism (DiPiro et al., 2010). Although 

559 kidneys have several functions, maintaining the homeostasis by regulating fluid and 

560 electrolyte balance is the main function of the kidney. The kidneys are responsible 

561 for the reabsorption of water, glucose, and amino acids (Pandit, 2007). Renal 

562 elimination of drugs consists of three stages of glomerular filtration, proximal 

563 tubular secretion and distal tubular reabsorption (Stringer, 2006). As it was stated 

564 before, the water-soluble materials are excreted better from the kidney (Haschek et 

565 al., 2010). Acidic or basic states of a drug and pH of the urine are important 

566 parameters in the fate of a drug in renal excretion (Haschek et al., 2010). Active 

567 tubular secretion and glomerular filtration are the main pathways in renal 

568 elimination (Haschek et al., 2010). 

569 A glomerulus is a big knot consisting of capillaries and surrounded by Bowman’s 

570 capsule; 120 to 150 ml of blood is filtered at the glomerular capillaries per minute. 

571 The glomerular capillaries are fenestrated and freely permeable to water, 

572 electrolytes and most plasma ingredients. The pore size in these capillaries can 

573 permit most agents and drugs with the molecular weight smaller than 67 kDa to 

574 pass through and return to plasma (Smith, 2006). 

575 If a drug does not binds to a plasma protein (such as albumin) and it is small 

576 enough to be filtered in the glomerulus, then, its clearance by glomerular filtration 

577 is equal to the glomerular filtration rate (GFR). 

578 CI GF = GFR Eq. 1.7 

579 In Eq. 1.7, CIqf is the clearance by glomerular filtration. However, many drugs 

580 bind to the plasma proteins, and bound drug will not be filtered, fu is the unbound 

581 fraction of drug. 

582 As a result the glomerular clearance can be calculated by Eq. 1.8 below (Janku, 

583 1993). 

584 Cl GF =fu. GFR Eq. 1.8 
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585 


586 Some of the chemicals that are filtered at the glomerulus are reabsorbed by active 

587 transport system found primarily in the proximal tubules. In proximal renal tubules, 

588 there are two systems primarily responsible for the active tubular secretion of 

589 drugs, one for organic anions and another for organic cations. The anionic system 

590 (OATs transporters) transports organic acids such as penicillins, indomethacin and 

591 glucuronides. The cationic system (OCTs transporters) transports organic bases 

592 such as morphine, procaine and quaternary ammonium compounds. Both active and 

593 passive transports are involved in tubular secretion process (Burckhardt and Wolff, 

594 2000). It is worth mentioning that P-glycoprotein is present in the brush border of 

595 the renal proximal tubules, and can play a role in the active tubular secretion of 

596 exogenous substances. This pump is involved in tubular secretion of, for example 

597 digoxin, and can be inhibited by quinidine or verapamil, leading to an increase in 

598 digoxin serum concentrations (Giacomini et al, 2010). Some drugs can inhibit the 

599 secretory function of tubules and renal clearance would reduce consequently. 

600 Probenecid which is also used in treating gout and hyperuricemia, is a good 

601 example of a drug that can inhibit tubular secretion of several agents such as 

602 verapamil (Piscitelli et al, 2005). 

603 Volume of plasma that is cleared from a compound in kidneys in unit time shows 

604 renal clearance (Cla) and can be calculate by equation 1.9 (Rosenbaum, 2011). 

605 Cl R = (C ur . Qr) / C Eq. 1.9 

606 Where, Cl R is the renal clearance of a compound, C ur stands for drug concentration 

607 in the urine, C shows plasma concentration and Q R is the urine flow rate (ml/min). 

608 

609 1.3.2. Elimination by the Liver 

610 Liver is a major elimination organ which eliminates drugs by metabolism and 

611 biliary excretion. One of the most important functions of the liver is the fonnation 

612 of bile. However, the liver is generally identified with its primary role in drug 

613 metabolism. 
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614 Bile is a composition of bile acids and other components such as phospholipids, 

615 bilirubin and cholesterol that is fonned in the canaliculus between adjacent 

616 hepatocytes and is actively discharged across the canalicular membrane. Many 

617 drugs are also excreted through this system in significant quantities (Taft, 2009). 

618 Each day hepatocytes secrete up to 1 litter of bile, a yellow-brown or olive-green 

619 liquid with a pH of 6.7-8.6 and consist mostly of water, bile salts, chlosterol, 

620 lecithin, bile pigments and numerous ions. The principal bile pigment is bilirubin. 

621 The phagocytosis of aged red blood cells liberates iron, globin and bilirubin. The 

622 iron and globin are recycled to bone marrow and bilirubin is secreted into the bile 

623 and finally breaks down in the intestine. One of its breakdown products is 

624 stercobilin which gives faeces their nonnal brown colour. Bile is partially an 

625 excretory product and partially a digestive secretion (Tortora and Derrickson, 

626 2006). The resulting bile is stored in the gallbladder and released into the intestine. 

627 Once bile is released into the intestine, some metabolites and unchanged drugs 

628 continue their way of elimination through the faeces. Others, mostly lipid-soluble 

629 drugs, are reabsorbed from the intestine and move to the systemic circulation 

630 (Luscombe and Nicholis, 1998). This process is known as enterohepatic circulation 

631 and it affects pharmacokinetics by keeping the plasma concentration high 

632 (Plusquellec et al, 1998). Despite the possibility of reabsorption, bile plays an 

633 important role in the excretion of xenobiotics, including drugs and their 

634 metabolites, which is in addition to its physiologic role in the intestinal digestion of 

635 lipids and lipid-soluble vitamins. This includes a diverse array of compounds, both 

636 polar and lipophilic, including anions, cations, and neutral molecules (Taft, 2009). 

637 Elimination of some drugs, e.g. oestrogens, is very slow while water-soluble drugs 

638 are excreted in faeces through the intestine quickly (Smith, 2006). Enterohepatic 

639 cycling and biliary elimination can continue until the compound is ultimately 

640 eliminated from the body by faecal or renal excretion or metabolism. 

641 

642 Hepatic clearance (67//) (by metabolism and/or biliary excretion) is defined as the 

643 volume of blood from which drug is removed completely by the liver per unit time. 

644 Hepatic clearance is a function of hepatic blood flow (Qh) and the extraction 

645 efficiency of the liver for the drug ( E H ) (Tozer and Rowland, 2006). 
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Eq. 1.10 
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Hepatic elimination can range from 0 (when the liver is incapable of removing the 
drug) to 100% (when the liver extracts the entire drug presented in a given pass). 
Moreover, CIh is equal to systemic clearance only when the drug is cleared 
completely by the liver after intravenous administration (Burton et al., 2006). 


The amount of circulating drug presented to the liver enzymes and cleared from the 
blood depends on the rate of hepatic blood flow (Qh), binding to the circulating 
proteins, and the metabolic activity and bile excretion involved in the hepatic 
elimination of the compound (Nassar et al., 2009). The hepatic intrinsic clearance 
of unbound drug in the liver (Cl u .mt) indicates the maximal ability of hepatocyte to 
remove drug from the liver. In most cases Cl UMt will exceed the hepatic clearance 
of the total drug (see equation below). The hepatic intrinsic clearance of unbound 
drugs is frequently related to metabolic activity, which often is assumed to be the 
rate-limiting step in hepatic elimination: 



Vmax 

Km+Cu 


Eq. 1.11 


Where Vmax is the maximum rate of the reaction for enzyme involved in the 
metabolism of the substrate, Km is the concentration at which the metabolic rate 
will be half in the enzyme reaction and C u is the concentration of unbound drug at 
the enzyme site in the liver (Burton et al., 2006). 


1.3.3. Elimination by the Other Sites 

Beside the major routes of excretion (bile and kidney), excretion can also take place 
through other excretion routes such as lungs, saliva, sweat, faeces, mother’s milk 
and hair. Lungs have the main role in pulmonary excretion of some xenobiotics 
which exist in gaseous phase in the blood (Haschek et al., 2010). 

In breastfeeding mothers, unchanged drugs, drug metabolites and toxicants can be 
excreted into the milk as an excretion route. As milk’s pH is slightly acidic at about 
6.5, basic compounds are more excreted into the milk than acidic compounds. 
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674 In case of extensive sweating, study of elimination through sweat could be 

675 essential. Iron, cadmium, zinc and some other metals could be excreted in sweat 

676 (Hale et al., 2002). 

677 Faeces can be the main route of elimination for any drug which is not absorbed in 

678 the small intestine or via enterohepatic circulation. 

679 

680 1.4. Function of the Liver and its Role in Drug Elimination 

681 Liver is the largest internal organ in the body; it is relatively larger in infancy, 

682 comprising one-eighteenth of the birth weight (Sherlock and Dooley, 2008). The 

683 liver is divided into the right and left lobes but most of the liver’s mass is found in 

684 the right lobe. Anatomically this exocrine and endocrine organ is situated in a very 

685 strategical place, between pancreas, gastrointestinal tract and spleen (Figure 1.2). 

686 The entire surface of the liver is covered by a capsule that contains nerves which 

687 can sense pain (Sherlock and Dooley, 2008). 

688 The gallbladder is located under the liver (Figure 1.2). The liver has a double blood 

689 supply; portal vein brings venous blood from the intestine and spleen, and the 

690 hepatic artery, coming from the celiac axis, supplies the liver with arterial blood. 

691 Branches of both the hepatic artery and the hepatic portal vein carry blood into liver 

692 sinusoids, where oxygen, most of the nutrients and toxins are taken up by the 

693 hepatocytes (Tortora and Derrickson, 2006). The liver receives approximately 1100 

694 ml/minute of blood from the portal vein and 350 ml/minute of blood from hepatic 

695 artery (Taft, 2009). 

696 Liver acts as a detoxifier to protect the general blood circulation from toxins that 

697 are absorbed through gastrointestinal tract. This is done through metabolism and 

698 excretion through bile. Moreover, liver is responsible for maintaining adequate 

699 blood sugar concentrations. Blood from pancreas, which is rich in glucagon and 

700 honnones, and the blood from spleen, which contains the metabolites from the red 

701 blood cell breakdown, pass through the liver via the portal vein for detoxification. 

702 Apart from the production of bile and metabolism, hepatocytes play other important 

703 functions such as destroying bacteria by the use of peroxisomes and lysosomes. A 
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hepatocyte can contain 800-1000 mitochondria per cell. Besides, hepatocytes have 
many rough and smooth endoplasmic reticulums. Smooth endoplasmic reticulum 
produces lipids, and catabolise estrogen, progesterone and testosterone. Rough 
endoplasmic reticulum may synthesise plasma proteins such as albumin from amino 
acids and then return them back to the space of disse (You and Morris, 2007). Other 
functions of hepatocytes are synthesis of the alpha and beta globulin, plasma 
proteins, coagulation factor, very low density lipoprotein (VLDL), low density 
lipoprotein (LDL) and high density lipoprotein (HDL). Activation of vitamin D is 
another essential function of hepatocytes (Pocock and Richards, 2009). 
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Figure 1.2. Bile release into the duodenum: 1. Flepatic lobule, 2. Left hepatic duct, 
3. Right hepatic duct, 4. Common hepatic duct, 5. Cystic duct, 6. Gall bladder, 7. 
Stomach, 8. Pancreatic duct, 9. Pancreas (adapted from Guyton and Flail, 2006). 


1.4.1. Biliary Excretion of Drugs 

Functional unit of the liver is known as lobule. Figure 1.3 shows the structure of the 
liver‘s lobule. A lobule is defined at the histological scale and involves branches of 
the portal vein and hepatic artery, and a central vein in terms of the blood flow. The 
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723 blood from branches of the portal vein and hepatic artery vessels eventually mix at 

724 sinusoid. In the sinusoids the mixed blood will keep moving from periphery to the 

725 centre of the lobule. A lobule is typically a hexagon (six-sided) structure that 

726 consist of specialized epithelial cells called hepatocytes, arranged in irregular, 

727 branching, interconnected plates around a central vein (Tortora and Derrickson, 

728 2006). In addition, the liver lobule contains highly-penneable capillaries called 

729 sinusoids, through which blood passes. Also present in the sinusoids are flexed 

730 phagocytes named Kupffer (Ito) cells, which destroy worn-out white blood cells 

731 and red blood cells, bacteria and other foreign agents in the venous blood draining 

732 from gasterointestinal tract. 

733 The plasma near hepatocytes leaks in the area close to hepatocyte cells, which is 

734 called space of disse. All the plasma is well exposed to hepatocyte and therefore 

735 hepatocyte can efficiently exchange chemicals with plasma in the space of disse. 

736 For example, the toxin in the plasma can be detoxified or the extra glucose can be 

737 converted to glycogen by the hepatocyte and then returned to the space of disse. 

738 Central vein is situated in the centre of each lobule; the blood from portal vein and 

739 hepatic artery passes to the central vein through the sinusoid (Guyton and Hall, 

740 2006). 

741 Hepatocyte’s one face is to the blood (via space of disse) and the other face is to the 

742 other hepatocytes. This means that, hepatocytes are laid back to back and the bile is 

743 secreted by the hepatocytes in the space between them, bile canaliculus, and then 

744 the bile duct. Excreted bile, unlike the blood flow, moves away from the centre of 

745 the lobule to the periphery (Figure 1.3). The resulting bile drains into branches of 

746 intrahepatic bile ductules that converge to the common hepatic bile duct 

747 (Matsumoto and Nakamura, 1992). Finally, the secreted bile from the left hepatic 

748 duct together with the right hepatic duct join together to make common hepatic 

749 duct. The common hepatic duct joins to the gallbladder through the cystic duct. In 

750 the healthy man, gallbladder stores about 50 ml of bile and during storage bile 

751 becomes more concentrated which increases its potency and intensifies its effect on 

752 fats (Guyton and Hall, 2006). Uptake from sinusoidal blood and then secretion of 

753 bile salts across the canalicular hepatocyte membrane are the major factors 

754 controlling the rate of bile secretion. The secreted bile by hepatocytes enters bile 
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755 canaliculi, narrow intercellular canals that empty into small bile ductules. The 

756 ductules pass into bile ducts at the periphery of the loubles. The bile ducts merge 

757 and eventually from the larger right and left hepatic ducts, which unite and exit the 

758 liver as the common hepatic duct. 


759 
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Figure 1.3. Schematic representation of bile duct and blood flow in lobule organisation. 
Lobule is the basic functional unit of the liver. The liver lobule is constructed around a 
central vein, which empties into the hepatic vein; 1. Branch of hepatic artery, 2. Branch of 
portal vein, 3. Space of disse, 4. Hepatocyte, 5. Bile canaliculas 6. Central vein, 7. Sinusoid 
(adapted from Guyton and Hall, 2006). 
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Figure. 1.4. The cartoon depicts substrate transport processes in the hepatocyte 
including sinusoidal and canalicular proteins efflux (E) and uptake (U) transport of 
drugs/drug-likes and their metabolites. 1. Sinusoidal membrane, 2. Ito cell, 3. space 
of Disse, 4. hepatocyte, 5. mitochondria, 6. nucleus, 7. endoplasmic reticulum, 8. 
lysosomes, 9. Peroxisome (Shari li and Ghafourian, 2014) 


Liver plays a very key role in drug elimination via bile. Liver is able to secrete up 
to 1 litre bile per day, which accumulates in gallbladder and can be emptied in 
duodenum for digestion of food (Pandit, 2007). The most important components of 
the bile are conjugated bilirubin, phospholipids and lecithin, IgA antibodies, 
cholesterol and bile salts such as cholic acid and chenodeoxycholic acid. Bile acids 
are some of the most important substances in bile that are vital for efficient 
digestion and emulsification of lipids. Most bile acids originate from the 
recirculation pool (Dawson et al., 2009). Bile acids are also synthesized by the liver 
from cholesterol. 

Canalicular bile secretion is an osmotic process in which active excretion of organic 
solutes into the bile canaliculus is the main driving force for the passive inflow of 
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778 water, electrolytes, and nonelectrolytes from hepatocytes (Trauner and Boyer, 

779 2003). Several different types of transporter proteins are involved in the uptake of 

780 compounds from the blood into hepatocytes, and others are responsible for efflux of 

781 the compounds from hepatocyte into the canaliculus through canalicular membrane. 

782 These proteins are located in the basolateral and canalicular membranes of the 

783 hepatocytes and the substrate compounds include chemically diverse metabolites 

784 and unchanged drugs. Figure 1.4 shows the main transport proteins in hepatocytes 

785 that are responsible for the uptake of compounds from plasma and excretion to 

786 outside the cells. While products of the multidrug resistance gene family (MDR), 

787 namely bile salt export pumps, Bsep (rat) and BSEP (human), transport monovalent 

788 bile salts (Rollins and Klaassen, 1979), excretion of non-bile salt organic anions 

789 and divalent sulphate or glucuronide bile salts is carried mainly by the multidrug 

790 resistance protein 2 (MRP2) and P-glycoprotein. Bile salt export pump has a 

791 limited role in drug excretion (Morgan et al., 2010). The transporter proteins 

792 responsible for biliary excretion have been explained in section 1.5. 

793 Chemical structure, polarity and molecular size as well as characteristics of the 

794 liver such as specific active transport sites within the liver cell membranes are the 

795 main factors which detennine elimination via the biliary tract (Rollins and 

796 Klaassen, 1979). Apart from physico-chemical factors, species, strain, gender 

797 differences and diet also can play a role in hepatic elimination. For instance, sex- 

798 dependent expression and activity of hepatic BCRP in males is higher in both mice 

799 and humans (Merino et al., 2005a). Another interesting fact is that hepatic MRP2 

800 expression in rats is nearly 10 fold higher than in humans (Li et al, 2008) 

801 moreover, species differences in substrate specificities in transporters are not 

802 negligible (Takekuma et al., 2007). 

803 

804 1.4.2. Metabolism of Drugs 

805 The liver is the important site of metabolism for various compounds including 

806 drugs. Metabolism, or biotransfonnation, is a major route of elimination for many 

807 drugs. Drug metabolism often converts lipophilic compounds into more polar 

808 products. Carbohydrates, fats, and proteins are all broken down by hepatic 
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809 enzymes. A healthy liver detoxifies much of the harmful substances (Gibson and 

810 Skett, 2001), but liver diseases can affect drug metabolism and the biliary clearance 

811 (e.g., Cirrhosis, Cholestasis and Carcinoma) (Paintaud et al., 1996). Studies in 

812 biliary excretion of some extensively metabolised drugs show that many patients 

813 with liver dysfunction can metabolise and excrete drugs normally, while other 

814 patients have a decreased metabolism and biliary excretion rates (Hvidberg et al, 

815 1974; Adjepon-Yamoah et al., 1974). 

816 A thorough understanding of the metabolic pathway of a drug is important in 

817 characterizing its phannacokinetic profile (Kwon, 2001). Figure 1.5 shows the 

818 biotransfonnation of drugs as an elimination pathway. Metabolism is usually 

819 catalysed by enzymes that can be found in most organs especially in the liver. If 

820 metabolism of a compound by one enzyme is blocked due to substrate saturation or 

821 by structural modifications, the compound can be metabolized by other types of 

822 enzymes (Kerns and Di, 2008). Drug metabolism or biotransformation is 

823 traditionally divided into two categories: Phase I and phase II reactions (Williams, 

824 1959). Phase I metabolism results in the introduction of functional groups into 

825 molecules and hence it is also known as functionalization reaction. Phase II 

826 reactions are conjugation reactions with various endogenous compounds. 

827 Cytochrome P450 monooxygenase, and nitro and azo reductase are some of the 

828 main phase I enzymes, while important phase II enzymes include D-glucuronic 

829 acid, glutathione and sulfate transferase (Tsaioun and Kates, 2011). Phase I and II 

830 reactions normally produce more polar compounds with higher aqueous solubility. 

831 
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Figure. 1.5. Drug biotransformation (Katzung et al., 2004). 


1.5. Elimination by Membrane Transporters 

Influx and efflux transporters are proteins expressed in cell membrane that have 
been shown to have a significant effect in the absorption, distribution and drug 
elimination. In the past ten years, there has been an enormous increase in the 
literature regarding the role of membrane transporters governing drug 
pharmacokinetics and response. An evaluation of the contribution of transporters to 
total tissue uptake and excretion is necessary to understand the drug disposition 
route (Giacomini et al., 2010). Membrane transporters are classified according to 
their mode of transport, energy coupling mechanism, molecular phylogeny, and 
substrate specificity. Transporter categories include channels (e.g. Escherichia coli 
GlpF glycerol channel), primary active transporters (e.g. Lactococcus lactis LmrP 
multi drug efflux pump), ABC transporters (e.g. P-gp in humans and 
microorganisms), secondary transporters (e.g. E.coli LacY lactose permease) and 
group translocators (e.g. E.coli MtlA mannitol transporters) (Ren and Paulsen, 
2005). Terada and co-workers have classified drug transporters into five main 
groups based mainly on their functions. There are: 1. Peptide transporters (PEPT), 
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853 2. Organic anion-transporting polypeptides (OATP), 3. Organic ion transporters 

854 (OAT, OCTN and OCT), 4. H "7 organic cation antiporters (MATE) and 5. ATP- 

855 binding cassette (ABC) transporters (mainly P-gp, MRP1 and BCRP). The 

856 structures of these transporters, distribution in tissues and their roles are different. 

857 In vivo and in vitro techniques can be used to assess the character of transporters 

858 (Terada etal., 2006). 

859 Various transporters have been implicated in the clearance of several compounds 

860 and metabolites. Transporters are known to be partially responsible for drug 

861 concentration ratios in plasma and tissues, thus efficacy and toxicity. A big part of 

862 intact drug molecules and their metabolites are excreted into the bile by efflux 

863 transporters and passive diffusion into the bile channel (canaliculus) (Niemi et al., 

864 2011). Transporters can be found in all tissues but the four major locations that 

865 transporters operate significantly are intestinal epithelia, hepatocytes, kidney 

866 proximal tubules and blood-brain barrier (Giacomini et al., 2010). Figure 1.6 

867 illustrates a schematic representation of the important transporters and their 

868 positions in the membrane domain of different organs such as sinusoidal 

869 membranes of hepatocytes. As seen in this Figure, several uptake and efflux 

870 membrane transporters including apical ATP-dependent efflux pump (including P- 

871 gp, MRPs and BCRP), organic anion transporting polypeptide family (OATPs), 

872 ileal sodium-dependent bile acid transporter (ASBT), organic cation transporters 

873 (OCTs) family, peptide transporters (PEPTs), organic cation/camitine transporters 

874 (OCTN), multidrug and toxin extrusion protein (MATE) and urate transporter 

875 govern the transport of compounds into and out of the cells. 

876 
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Figure. 1.6. The cartoon illustrates selected human transport proteins in plasma 
membrane domains of intestinal epithelia (a), hepatocytes (b), kidney proximal 
tubules (c) and brain capillary endothelial cells (d) (Adapted from Giacomini et al., 
2010) 


1.5.1. Peptide Transporters (PEPT) 

The currently known peptide transporters include peptide transporters 1 and 2 
(Peptl and Pept2) and peptide/histidine transporters 1 and 2 (PHT1 and PFIT2). 
Studies showed that Peptl is a low-affinity and high-capacity transport system for 
di and tripeptides (Leibach and Ganapathy, 1996). Conversely, Pept2 is a high 
affinity and low capacity transporter for di and tripeptides. The PFIT1 and PHT2 
transport di- and tri-peptides as well as histidine. These transporters are 
stereoselective as they show the affinity to L-enantiomers of amino acids (Doring et 
al., 1998). 
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892 Peptl was first cloned in rabbit intestinal epithelium membrane (Fei et al., 1994). 

893 Pept2 for the first time was cloned from human kidney (Liu et al., 1995). Peptl and 

894 Pept2 can transport many peptides with different volumes and charges, but not long 

895 peptides with more than four peptide bonds (Daniel, 2004). These transporters are 

896 found mostly in the small intestine and kidney’s proximal tubules and they mediate 

897 absorption of certain drugs e.g. cephalosporins and other beta-lactam antibiotics. 

898 There is no evidence of existence of these peptide transporters in blood brain 

899 barrier (BBB) (Han et al., 1998). However, the expression of Peptl was found with 

900 low levels in the liver, in addition to the major sites, small intestine and kidney 

901 (Liang et al., 1995). Recently, the H + -peptide cotransport has been established in 

902 the human bile duct epithelium cell line SK-ChA-1 (Knutter et al., 2002). 

903 Human PHT1 and PHT2 were found to be expressed at low levels in 

904 gastrointestinal tract and different tissues with mRNA expression throughout the 

905 gasterointestinal tract. In addition, the mRNA expression was also demonstrated in 

906 the liver, brain, colon, heart, kidney, lung, ovary, pancreas, placenta, prostate, 

907 spleen and testis (Herrera-Ruiz et al., 2001). 

908 In the past decade, amino acid modifications have been used in the design of 

909 prodrugs to allow for PEPT1 and PEPT2 intestinal absorption of weakly absorbed 

910 drugs such as antiviral agents levovirin and azidothimidine, and anticancer drugs 

911 gemcitabine and floxuridine (Sugawara et al., 2000; Li et al., 2006). 

912 

913 1.5.2. Organic Anion-Transporting Polypeptides (OATP) 

914 OATP is a family of membrane transporters that mediate the cellular uptake of 

915 endogenous substrates and drugs. The importance of OATPs in excretion has been 

916 shown by different studies (Cvetkovic et al., 1999; Mikkaichi et al., 2004; Kim, 

917 2003). The human OATP family consists of 11 members: OATP1A2, IB 1, 1B3, 

918 1C1, 2A1, 2B1, 3A1, 4A1, 4C1, 5A1 and 6A1 (Hagenbuch and Meier, 2003). As 

919 seen in Figure 1.6, members of this family can be found in sinusoidal (basolateral) 

920 membrane of hepatocytes, basolateral membrane of proximal tubules, and apical 

921 (luminal) side of the blood-brain barrier and intestinal epithelia. Certain OATP 

922 isoforms are selectively involved in hepatic uptake of hydrophobic anions from the 
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923 plasma (Taft, 2009). Although the role of OATPs in renal (Sekine et al., 2006) and 

924 hepatic excretion (Nozawa et al., 2005) as well as uptake across the blood-brain 

925 barrier (Gao et al., 2000) and gastrointestinal tract (Sai et al., 2006) has been 

926 demonstrated, their importance in pharmacokinetics is still not fully understood 

927 (Glaeser and Kim, 2006). 

928 Despite the title, OATP substrates are not limited to organic anions, but also 

929 include cations as well as neutral and zwitterionic compounds (Niemi et al, 2011). 

930 The OATP family members mediate the sodium ion co-transport of various organic 

931 agents including organic dyes, bile salts, steroid conjugates and thyroid hormones. 

932 In rat, the organic anion transporting polypeptides Oatpl, Oatp2 and Oatp4 have 

933 been indicated as the main sodium independent uptake proteins (Kullak-Ublick et 

934 al., 2000). 

935 OATP structure is a protein with twelve transmembrane domains (Hagenbuch and 

936 Gui, 2008). The first of the organic anion-transporting polypeptides OATP1A2 

937 (OATP1) was originally cloned from a human kidney cDNA library (Lu et al, 

938 1996). Later, OATP1A2 was cloned from rat liver and since then, several different 

939 forms of OATPs in human and rodents have been discovered (Jacquemin et al., 

940 1994). For instance, OATP IB 1 was cloned independently by different laboratories 

941 (Tirana et al., 2001; Hsiang et, 1999; Konig et al., 2000a; Abe et al., 1999). 

942 OATP1B3 was also cloned from human liver (Abe et al., 2001; Konig et al., 

943 2000b). OATP1B3 is mainly expressed in the basolateral membrane of hepatocytes 

944 (Abe et al., 2001), but it has also been detected in certain cancer cell lines and 

945 tissues (Abe et al., 2001). Over the last two decades the impact on drug 

946 phannacokinetics of the organic anion transporting polypeptides (OATPs: OATP- 

947 IB 1, 1B3 and 2B1), expressed on the sinusoidal membrane of the hepatocyte, has 

948 been increasingly recognized. 

949 Human OATP IB 1 (also known as OATP2) is a liver specific transporter that is 

950 expressed on the sinusoidal membrane of human hepatocytes and mediates the 

951 hepatic uptake of many endogenous compounds. The substrate specificity of 

952 OATP IB 1 is closely comparable to OATP1A2 and both can transport drugs such as 

953 eicosanoids, benzylpenicillin, methotrexate, rifampin, pravastatin, rosuvastatin and 

954 cerivastatin (Glaeser and Kim, 2006). Apart from hepatocytes, OATP1A2 is 
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955 expressed in various tissues including brain and kidneys. Moreover, OATP1A2 can 

956 facilitate the entry of its substrates through the duodenal wall into circulation 

957 (Glaeser et al., 2007). Regarding the acidic, basic and neural character of 

958 substrates, OATP1A2 possesses perhaps the broadest spectrum among the members 

959 of the superfamily (You and Morris, 2007). 

960 OATP1B3 has a significant substrate overlap with OATP1B1 (Karlgren et al, 

961 2012a). However, OATP1B3 is also able to transport oligopeptide honnones such 

962 as cholecystokinin 8 (Ismair et al., 2001) and digoxin (Kullak-Ublick et al., 2001), 

963 although the latter has been disputed (Taub et al., 2011). Unlike OATP1B1, 

964 OATP1B3 has been implied in the transport of angiotensin II receptor antagonist, 

965 tehnisartan, and its glucuronide conjugate (Abe et al., 1999) as well as mediating 

966 the cellular uptake of opioid peptide II, digoxin and ouabain (Kullak-Ublick et al., 

967 2001). The importance of OATP1B1 and OATP1B3 in hepatic transport has been 

968 explained by recent studies by Fenner and co-workers indicating that OATP1B- 

969 mediated transport can be the rate-detennining step of hepatobiliary drug clearance 

970 (Fenner et al., 2012). 

971 In addition to drug clearance role, recent studies have suggested that overexpression 

972 of OATP1A2, OATP1B1 and OATP1B3 in pancreatic cancer tissues (Kounnis et 

973 al., 2011) as well as in ovarian cancer cells (Svoboda et al., 2011) may be exploited 

974 in the design of novel targeted cancer therapy (Sainis et al, 2010). This is 

975 particularly important in light of the increasing global burden of cancer. 

976 GLOBALCAN 2008 (Ferlay et al., 2010) reported over 12.7 million cancer cases 

977 and 7.6 million cancer deaths are estimated to have occurred in 2008 and deaths 

978 from cancer worldwide are projected to continue rising with an estimated 13.1 

979 million deaths in 2030 (Jemal et al., 2011). 

980 

981 1.5.3. Organic Ion Transporters (OAT, OCTN and OCT) 

982 Organic anion and cation transporters (OATs and OCTs) and organic 

983 cation/carnitine transporter (OCTN) superfamily are members of the solute carrier 

984 family, subfamily 22 (SLC22). These transmembrane proteins are largely expressed 

985 in excretory organs such as kidney and liver, as a major component of the human 
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986 xenobiotic excretion machinery. In the liver, these uptake transporters play 

987 important role in the initial sinusoidal influx of drugs into hepatocytes (van 

988 Montfoort et al, 2003) (see Figure 1.6). These transporters have wide substrate 

989 specificities for a range of exogenous and endogenous substrates including many 

990 commonly used drugs, antibiotics, anti-hypertensives, and anti-inflammatories, 

991 among others (Leabman et al. , 2003). 

992 In kidneys, organic cation transporters mediate the transport of small organic cation 

993 such as tetraethylammonium. OCT1 was the first discovered OCT from rat kidneys 

994 in 1994 (Grundemann et al, 1994). In humans, OCT1 is expressed at extremely 

995 low levels in the kidney and is mainly found in the liver (Motohashi et al, 2002). 

996 As seen in Figures 1.4 and 1.6, OCT1 can be found abundantly in hepatocytes and 

997 may be seen as the most important transporter for distribution of cationic 

998 compounds into the liver from sinusoidal membrane (Nies et al., 2009). OCT2 was 

999 isolated from the rat kidney using cDNA cloning of the OCT1 sequence (Okuda et 

1000 al., 1996). OCT2 is generally considered to be a kidney transporter, though mRNA 

1001 is expressed at low levels in other tissues such as spleen, placenta, small intestine 

1002 and brain (Gorboulev et al., 1997). OCT3 has the widest tissue distribution of the 

1003 OCTs and its protein expression has been confirmed on the basolateral membrane 

1004 of hepatocytes (Nies et al., 2009), the basal membranes of trophoblasts (Sata et al., 

1005 2005), the apical membrane of enterocytes (Muller et al, 2005) and the luminal 

1006 membrane of lung epithelial cells (Lips et al., 2005). Substrates for OCT1-3 include 

1007 a wide range of structurally unrelated organic cations, including many drugs. An 

1008 extensive list of OCT 1-3 substrates and inhibitors has been provided in a recent 

1009 review on the importance of organic cation transporters in drug therapy (Nies et al., 

1010 2011). Among these substrates are catecholamines, monoamine neurotransmitters 

1011 and several antiviral drugs. 

1012 OATs are fairy well-studied organic anion transporters and are mainly expressed in 

1013 excretory organs, especially kidney for the uptake of organic anions from the blood 

1014 to renal tubule cells (see Figure 1.6). OATs are membrane proteins with 12 putative 

1015 membrane-spanning domains and function as sodium-independent exchangers or 

1016 facilitators. OATs mediate the influx of a wide range of organic anions including 

1017 inorganic ions (e.g. CL" and HC03"), endogenous (e.g. cyclic nucleotides, 
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1018 prostaglandins, urate, dicarboxylates) and exogenous anions (various anionic drugs 

1019 and environmental substances) (Sekine et al, 2000). In comparison with OATPs, 

1020 substrates of OAT have been suggested to be generally lower molecular weight 

1021 (Roth et al, 2012). The transport mechanism of OAT1 and OAT3 is kn own to be 

1022 indirectly sodium-dependent and involves a ‘tertiary active transport’ mechanism to 

1023 move organic anions across the basolateral membrane into the proximal tubule 

1024 cells. The primary active Na + and K + -ATPase located on the basolateral membrane 

1025 pumps Na + from intracellular to extracellular space to maintain a Na + gradient 

1026 (Glaeser and Kim, 2006). This is used by the secondary active Na + -dicarboxylate 

1027 cotransporter to maintain a high intracellular concentration of a-ketoglutarate, 

1028 which is used to drive uptake of other organic anions by OAT1 and OAT3. Several 

1029 studies have revealed that rat Oatl transports a broad spectrum of substrates 

1030 (Glaeser and Kim, 2006). Endogenous organic anions such as prostaglandins, cyclic 

1031 nucleotides, folates (Sekine et al., 1997) and some xenobiotics such as beta-lactam 

1032 antibiotics (Jariyawat et al., 1999; Leabman et al., 2003), NSAIDs (Apiwattanakul 

1033 et al., 1999) as well as many antiviral drugs (Cihlar et al., 1999; Wada et al, 2000) 

1034 are examples of compounds transports by rat Oatl. Human OAT1 also transports 

1035 adefovir, cidofovir, zidovudine (AZT), acylclovir and ganciclovir (Cihlar et al., 

1036 1999; Ho et al., 2000). 

1037 OAT2 mRNA has the highest expression levels in the liver with lower levels also 

1038 seen in kidney (Sekine et al., 1998; Sun et al, 2001; Hilgendorf et al., 2007). 

1039 Human OAT3 is exclusively expressed in the basolateral membrane of the proximal 

1040 tubule cells of kidneys (Cha et al., 2001; Sun et al., 2001) while in rat, Oat3 is most 

1041 abundantly expressed in liver and to lesser extent in kidney and brain (Kusuhara et 

1042 al., 1999). OAT4 mRNA is expressed in kidney and placenta (Bleasby et al., 2006). 

1043 OAT5 expression in human is not well studied, although Northern blot analysis 

1044 demonstrates mRNA expression in the liver (Sun et al., 2001). OAT7 has been 

1045 shown to be exclusively expressed in the liver, where its expression has been 

1046 localized to the basolateral membrane of hepatocytes (Shin et al., 2007). OAT 10 

1047 mRNA has the highest expression levels in the kidney followed by brain, heart, 

1048 small intestine and colon (Bahn et al, 2008). URAT1 is expressed in kidney and it 

1049 is the only member of the OAT family for which mutations have been linked to a 

1050 disease (Enomoto et al., 2002). 
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1051 Carnitine is an essential zwitterion cofactor that plays an important role in the 

1052 metabolism of lipids and subsequently in the production of energy. Carnitine 

1053 absorption is via small intestine with the help of Organic Cation/Camitine 

1054 transporter 2 (OCTN2), which is located on the brush border membrane (Elimrani 

1055 et al., 2003). OCTN2 transports organic cations without involving Na + , but it 

1056 transports carnitine only in the presence of Na + . Wu and colleagues found that rat 

1057 OCTN1 is expressed in a wide variety of rat tissues and organs such as intestine, 

1058 liver, kidney, heart and brain (Wu et al., 2000). OCTN2 is also expressed in the 

1059 heart, kidney, placenta and brain (Wu et al., 1999). There is no evidence of 

1060 presence of OCTN2 in human liver while it is strongly expressed in rat liver (Tamai 

1061 et al., 1998). 

1062 

1063 1.5.4. H + / Organic Cation Antiporter (MATE) 

1064 Multidrug and toxin extrusion transporters (MATE) mediate cellular efflux of a 

1065 variety of organic cations, including many drugs (Lickteig et al., 2008). MATE1, 

1066 which functions as drug/sodium antiporter, is the first example of Na + -coupled 

1067 multidrug efflux transporter (Morita et al., 2000). The MATE are protein 

1068 transporters which are primarily expressed in the kidney and liver, localized at the 

1069 apical membranes of the renal tubules and bile canaliculi (Motohashi and Inui, 

1070 2013; Motohashi et al., 2013). MATE1 has been isolated as an H + /organic cation 

1071 antiporter located at the renal brush-border membranes (Asaka et al., 2007). 

1072 MATE1 can transport zwitterionic drugs such as fexofenadine and levofloxacin, as 

1073 well as organic cation drugs such as metformin and cimetidine (Terada et al, 2006; 

1074 Masuda et al. , 2006). 

1075 In rat, apart from kidney, MATE1 also expressed abundantly in the placenta, 

1076 slightly in the spleen, but not expressed in the liver (Terada et al, 2006). Rat 

1077 multidrug and toxin extrusion (MATE1) transporter is expressed in kidney, but not 

1078 in the liver (Ohta et al, 2006; Masuda et al., 2006). In humans, MATE1 mRNA 

1079 levels are highest in the liver, and are localized to the canalicular membrane of 

1080 hepatocytes. MATE1 mRNA expression is also high in the kidneys, where it is 

1081 localized to the apical membrane of the renal tubule. Similarly, MATE2 mRNA 
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1082 levels are by far at their highest in the kidneys, while relatively low in most other 

1083 tissues (Lickteig et al., 2008). 

1084 

1085 1.5.5. ABC Transporters 

1086 ATP-binding cassette (ABC) transporters are transmembrane proteins that utilize 

1087 the energy of adenosine triphosphate (ATP) binding and hydrolysis to carry out 

1088 certain biological processes including translocation of various substrates across 

1089 membranes. These are mainly efflux transporters that help export compounds out of 

1090 the cells (Massey et al., 2014). Amongst the largest transporter superfamilies, these 

1091 transporters may be found in all known organisms and around 1100 various 

1092 transporters belong to this group (You and Morris, 2007). Figure 1.6 illustrates 

1093 several members of ABC transporters in brain, kidney, intestine and liver. In the 

1094 liver, the ABC transporters MRP2, BCRP, P-gp and BSEP (ABCB11 and also 

1095 known as sPgp (sister of P-glycoprotein)) are found in the canalicular membrane of 

1096 hepatocytes exporting the substrates into the bile. Other members of ABC 

1097 transporter family, including MRP3, MRP4 and MRP6, are distributed in sinusoidal 

1098 membrane and they export the substrates from hepatocytes back into the blood. 

1099 ABC transporters can be found in many normal tissues with an important role in 

1100 drug elimination or other biological processes. 

1101 Genetic defects in some of the ABC transporters may result in a disease; mutations 

1102 in up to 14 mammalian ABC transporters (out of 48 ABC genes) have been 

1103 associated with disease states (Borst and Elferink, 2002). For example, dysfunction 

1104 of ABCB2 transporter results in immune deficiency problems and dysfunction of 

1105 ABCC2 results in Dublin-Johnson syndrome (Gottesman and Ambudkar, 2001). 

1106 These transporters are further categorised into seven distinct subfamilies of proteins 

1107 using phylogenetic analysis. The subfamilies include: ABCA (12 members), ABCB 

1108 (11 members), ABCC (12 members) ABCD (4 members), ABCE, ABCF (3 

1109 members) and ABCG (1 member) (Hennessy and Spiers, 2007). The best-studied 

1110 proteins of this family include P-gp (ABCB1) also known as MDR1 due to its 

1111 ability to produce multiple drug resistance in cancer cells, and the sulphonylurea 

1112 receptor (SUR) subfamily encoded by members of ABCC genes that is involved in 
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1113 regulating insulin secretion in [3-cells of the pancreas (Dassa and Bouige, 2001). 

1114 Others include the ABCC subfamily which encodes the cystic fibrosis 

1115 transmembrane conductance regulator (CFTR) protein that plays a part in exocrine 

1116 secretions of chloride (Dean, 2002; Dassa and Bouige, 2001). A number of these 

1117 proteins including MRP1, BCRP and P-gp are reported to be overexpressed in 

1118 malignant cells thus causing these cells to be resistant to drug therapy, hence the 

1119 multidrug resistance (MDR) terminology. 

1120 In eukaryotic cells, ABC transporters usually direct molecules from the cytoplasm 

1121 to the outside of the cell (Dean, 2002) with the main function of transporting 

1122 xenobiotic compounds out of the cell for transport to other areas of the body or for 

1123 excretion. On the other hand, ABC transporters in prokaryotic cells can be either an 

1124 importer or exporter of compounds. Bacterial importers are important for the cell 

1125 survival and typically important substrates such as iron, inorganic ions as well as 

1126 peptides and amino acids. Substances requiring removal from prokaryotic cells 

1127 include cell wall components such as liposaccharides and toxins involved in 

1128 pathogens e.g. haemolysin (Davidson et al., 2008). 

1129 Structurally, ABC transporters consist of two distinct domains, the nucleotide 

1130 binding domain (NBD) and the transmembrane domain (TMD). A typical ABC 

1131 transporter may have two TMD domains and two NBD domains (Higgins, 2001). 

1132 The TMD of various ABC transporters is diverse and could contain 6-11 

1133 membrane-spanning a-helices and provides the specificity for the substrate in order 

1134 to function as the route for molecules to cross the membrane. The NBDs of the 

1135 protein, also known as the ATP-binding domain, can be found in the cytoplasm and 

1136 are consequently hydrophilic in nature (Dean, 2002). These domains help transfer 

1137 the energy needed to transport the substrate across the membrane (Dean, 2002; 

1138 Ambudkar et al., 2003). NBD consists of two subdomains: 1. ‘the catalytic core 

1139 domain’ that includes walker motif A and walker motif B with a dodecapeptide part 

1140 that connects the two walker motifs, and 2. a smaller, structurally diverse a-helical 

1141 subdomain that contains the ABC signature motif. ABC transporter proteins bind 

1142 ATP through their NBDs and use the energy derived from this to transfer molecules 

1143 across cell membranes. A glutamine residue residing in a flexible loop called Q 

1144 loop that connects the TMD and NBD is presumed to be involved in the interaction 
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1145 of the NBD and TMD, particularly in the coupling of nucleotide hydrolysis to the 

1146 conformational changes of the TMD during substrate translocation. The H motif or 

1147 switch region contains a highly conserved histidine residue that is also important in 

1148 the interaction of the NBD domain with ATP. 

1149 

1150 I.5.5.I. ABC Transporters in Multidrug Resistance 

1151 During cancer treatment, tumour cells can become resistant to chemotherapy due to 

1152 increased excretion of drugs out of tumour cells or target proteins (Dean, 2002). 

1153 Pathways such as these can lead to multidrug resistance (MDR) thus contributing to 

1154 the failure of chemotherapy in malignant diseases. Multidrug resistance is the tenn 

1155 given to describe tumours developing resistance to two or more chemotherapeutic 

1156 drugs. This is the net result of the overexpression of membrane transporters that 

1157 actively remove toxic chemotherapeutic agents out of tumour cells (Sarkadi et al., 

1158 2006). ABC transporters have been widely associated with resistance and the ABC 

1159 genes ABCB1 (encoding P-gp), ABCC1 (encoding MRP1) and ABCG2 (encoding 

1160 BCRP) are the main genes that can be upregulated in cancerous cells. MRP1 is 

1161 expressed in epithelial cells and in non-malignant cells plays a role in protecting 

1162 kidney tissues, bone marrow and the intestinal mucosa from xenobiotics as well as 

1163 contributing to the removal of drugs from the cerebrospinal fluid (Schinkel and 

1164 Jonker, 2003). Moreover, MRP1 confers drug resistance to a range of cancer drugs 

1165 and transports conjugates of hydrophobic drugs as well as organic anions (Schinkel 

1166 and Jonker, 2003). P-glycoprotein (P-gp) was one of the first ABC transporters to 

1167 be associated with resistance (Leslie et al., 2009) and led to the discovery of other 

1168 genes in the ABC transporter family involved in multidrug resistance. P-gp is 

1169 highly expressed in cancerous tissues and it is reported to be involved in cancers of 

1170 the liver, colon and kidney tissues (Schinkel and Jonker, 2003). Breast Cancer 

1171 Resistance Protein (BCRP) was discovered after analysis of mitoxantrone-resistant 

1172 cell lines that did not over-express P-gp or MRP1 by Doyle et al (1998). It was first 

1173 cloned from a multidrug-resistant breast cancer cell line, hence the name. 

1174 In addition to chemotherapeutic agents P-gp, BCRP and MRP1 also actively 

1175 transport non-cytotoxic drugs and xenobiotics (Matsson et al., 2009; Sharom, 2008; 
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1176 Mao and Unadkat, 2005), thereby affecting the phannacokinetics and tissue 

1177 distribution of these drugs. Table 1.1 gives a summary description of these 

1178 transporters. 

1179 Table 1.1. Properties of ABC transporters 


Comm¬ 

on 

names 

Systematic 

name 

Tissue 

localisation 

Substrates 

Inhibitors 

P-gP / 
MDR1 

ABCB1 

Apical 
membranes 
of the 
intestine, 
liver, kidney, 
placenta and 
blood brain 
barrier (BBB) 

Cancer drugs : 
Anthracyclines, vinca 
alkaloids, taxanes, 
captothesins, 
anthracenes and 
epipodophyllotoxins 

Non-cancer drugs : 

Digoxin 

First generation 
inhibitors like 
verapamil; Second 
generation inhibitors 
such as valspodar 
and third generation 
inhibitors like 
Elacridar 
(GF120918) 

MRP1 

ABCC1 

Basolateral 
membranes 
of all tissues, 
and possibly 
apical 

membrane of 
the BBB 

Cancer drugs : 
anthracyclines, vinca 
alkaloids, captothesins, 
epipodophyllotoxins 
and methotrexate 

Other compounds : 
Glutathione, sulphate 
and glucuronide 
conjugates 

BSO, flavonoids, 

HIV protease 
inhibitors, non- HIV 
protease inhibitors, 
PAK-104P and 

MK571 

BCRP 

ABCG2 

Apical 

membranes 

in the 

intestines, 

liver, 

immature 

stem cells, 

the brain, 

mammary 

glands and 

placenta 

Cancer drugs : 

anthracyclines, 

captothesins, 

epipodophyllotoxins, 

mitoxantrone, 

flavopiridol, 

methotrexate and 

bisantrene 

Other compounds :, 
drug and metabolite 
conjugates, food 
carcinogens like PhiP 
and other drugs 

Flavonoids, fungal 
toxins like FTC, 
calcium channel 
blockers and 
tyrosine kinase 
inhibitors 


1180 Data from Sharom, 2008; van Herwaarden and Schinkel, 2006 and Gottesman et ah, 2002 
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1181 


1182 Apart from their role in MDR, transporter proteins encoded by ABCB1, ABCC 

1183 family (mainly MRP2) and ABCG2 have major functions in the pharmacokinetics 

1184 and tissue distribution of different drugs. As can be seen in Figure 1.6, P-gp, BCRP 

1185 and MRP2 are located in the apical membrane of intestinal epithelia and export the 

1186 substrate compounds from epithelial cells back into the lumen, while MRP3 is 

1187 located in the basolateral membrane and transports its substrates from cytoplasm 

1188 into the blood. The main ABC transporters in the kidney are P-gp, MRP2 and 

1189 MRP4, with an efflux role for active secretion of their substrates. P-gp, BCRP and 

1190 MRP2 are also involved in bile secretion through efflux of their substrates in the 

1191 canalicular membrane. P-gp, BCRP, MRP4 and MRP5 are the main ABC 

1192 transporters responsible for the efflux of compounds from the brain. Below is a 

1193 description of these ABC transporters in terms of their structure, binding and efflux 

1194 mechanisms, substrates, inhibitors and polymorphisms. 

1195 

1196 I.5.5.I.I. P-glycoprotein (ABCB1 Subfamily, MDR) 

1197 The schematic diagram of P-gp can be seen in Figure 1.7. This protein consists of 

1198 1280 amino acids fonning 12 transmembrane segments. P-gp has an exceptionally 

1199 wide range of substrate specificity for cationic and lipophilic drugs. Apart from 

1200 drugs, P-gp as a strong efflux pump is able to export a number of structurally 

1201 diverse compounds including anthracyclines, epipodophyllotoxins and vinca 

1202 alkyloids (Eckford and Sharom, 2009). 
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1280 


Figure 1.7. Schematic diagram showing the structure of human P-gp with 1280 
amino acids and 12 transmembrane segments. Each loop in this topological view 
represents an amino acid residue (Adapted from Gottesman and Pastan, 1988). 

P-gp is expressed at many physiological barriers such as the intestinal epithelium, 
hepatocytes, renal proximal tubular cells, pancreatic and bile ductules, adrenal 
gland and the endothelial capillaries of the brain comprising the blood brain barrier 
(Kim et al., 1998; Thiebaut et al., 1987; Croop et al., 1989). This transport protein 
plays a significant role in different steps of absorption, distribution, metabolism and 
elimination of many compounds including anticancer drugs (Schinkel et al., 1995; 
Leveque and Jehl, 1995; Relling, 1996). In the membrane of hepatocytes, where P- 
gp is mostly expressed, P-gp is involved in the efflux of xenobiotics into the bile 
(Yu et al., 2010). In the gastrointestinal tract, P-gp pumps out the substrates into the 
gastric lumen; in such a case, the agents cannot access the portal vein to reach the 
systemic circulation (Schinkel et al., 1997). Therefore, P-gp can reduce the 
absorption and oral bioavailability of the substrate drugs. Moreover, it can be found 
in testis barrier (Melaine et al., 2002), blood brain barrier cells (Beaulieu et al., 
1997), blood mammary tissue barrier (Edwards et al., 2005), blood-inner ear barrier 
(Saito et al., 1997), placenta (Gil et al., 2005) and endometrium of pregnant women 
(Arceci et al., 1988). A natural function of P-gp is that it prevents harmful 
chemicals or foreign compounds (xenobiotics) including drug molecules from 
getting into the brain and the placenta (Lin and Yamazaki, 2003). 
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1225 P-gp is highly overexpressed in tumour cells and is able to bind and transport many 

1226 chemically and structurally unrelated drug molecules thus explaining its MDR 

1227 ability in cancer chemotherapy (Gottesman and Ambudkar, 2001). As a 

1228 consequence of P-gp blockage, e.g. in the presence of inhibitors, the intracellular 

1229 accumulation of the substrate drugs (chemotherapeutic agents) will increase which 

1230 may result in excessive toxicity of these drugs. However, the reduction of 

1231 chemotherapeutic dose is not a solution as it will reduce the overall efficacy 

1232 (Wacher et al., 1995; McDevitt and Callaaghan, 2007; Wandel et al., 1999). An 

1233 example of this situation is when a drug molecule such as digoxin, which is a P-gp 

1234 substrate, is accumulated in the liver and kidney as a result of P-gp inhibitors 

1235 preventing the biliary and renal elimination of digoxin by active secretion with the 

1236 aid of P-gp efflux system (Hennessy and Spiers, 2007). 

1237 P-gp has a promiscuous binding site that can accept a wide range of substrates of 

1238 varying chemically unrelated chemical structures. The weight range of P-gp 

1239 substrates can be very broad and vary from a MW of 250 to 1850 Da. Besides, the 

1240 substrate molecules can be acidic, zwitterionic, uncharged or positively charged 

1241 (Schinkel et al, 1997). Moreover, substrates can be amphipathic or hydrophobic 

1242 (Kerns and Di, 2008). In terms of the modulators of this multispecific transporter, 

1243 not only phannaceutical drugs but also herbal products and some food components 

1244 can affect the function of P-gp as a transporter. It is therefore advisable that in drug 

1245 discovery, when a drug candidate is found to be a P-gp substrate, structure 

1246 modifications are applied to reduce the P-gp activity, leading to a better therapeutic 

1247 effect with less complications such as drug-drug interactions in drug discovery 

1248 projects (Kerns and Di, 2008). 

1249 The structure of human P-gp was first elucidated by electron microscopy 

1250 (Rosenberg et al., 1997) and image analysis. P-gp was reported as having a central 

1251 core with an opening to the extracellular side of the membrane but is closed 

1252 towards the cytoplasm. Recently, Aller et al reported a medium resolution (3.8-4.4 

1253 A) X-ray structure of P-gp that supported previous claims about the structure of P- 

1254 gp and revealed tentative binding sites for drug compounds (Aller et al., 2009). The 

1255 study proposed a detailed structure for mouse P-gp which has 87% sequence 

1256 identity to human P-gp. In addition to the structure of apo P-gp at 3.8 angstroms, 
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two structures of P-gp co-crystallised with cyclic peptide inhibitors cyclic-tris-(R)- 
valineselenazole (QZ59-RRR) and cyclic-tris-(S)-valineselenazole (QZ59-SSS) 
were also determined. The structures showed distinct drug-binding sites in the 
internal cavity capable of stereoselectivity that is based on hydrophobic and 
aromatic interactions. The structure of apo P-gp reveals a large internal cavity that 
is approximately 6000 angstroms cubed, and a big gap of 30 angstrom between the 
two nucleotide-binding domains (Figure 1.8). In agreement with previous theories 
(Rosenberg et al., 1997, Higgins and Gottesman, 1992), the apo and drug-bound P- 
gp structures in Aller’s work indicate portals open to the cytoplasm and the inner 
leaflet of the lipid bilayer for drug entry. The inward-facing conformation 
represents an initial stage of the transport cycle that is competent for drug binding 
(Figure 1.8). 



Figure 1.8. (a) P-gp structure and efflux activity; substrates are in red while ATP is 
in magenta, (b) Ligand-binding site on the transmembrane domain of P-gp (adapted 
from Chen et al., 2012). 


The X-ray crystal structures proposed by Aller gave some useful information 
regarding the amino acid residues involved in substrate binding to P-gp. The crystal 
structure PDB Code (3G60) showed one molecule of QZ59-RRR bound to the 
middle site in the binding pocket and two molecules of QZ59-SSS bound at upper 
and lower sites which are overlapping the middle site. This showed that P-gp can 


50 







1279 bind to two drug molecules at the same time and confirmed the diverse and 

1280 polyspecific nature of P-gp (Aller et al., 2009). 

1281 According to Aller and co-workers, the binding pocket of P-gp includes the 

1282 transmembrane helices 1, 6, 7 and 12 which mainly consist of hydrophobic and 

1283 aromatic residues. These included phenylalanine (Phe) and tyrosine (Tyr) residues 

1284 in addition to the aromatic and aliphatic residues serine, threonine and glutamine 

1285 (Ser, Thr, Gin). Despite these key attributes being made available, questions have 

1286 been raised about the absence of ATP in the structure and the fact that the structures 

1287 do not appear to undergo conformational changes upon drug binding (Gottesman et 

1288 al ., 2009). 

1289 Substrates of P-gp mainly interact with the protein by hydrophobic interactions, ji-ji 

1290 stacking and van der Waals forces. The P-gp X-ray crystal structure also shows this 

1291 as the cyclic peptide inhibitors bind to P-gp through hydrophobic aromatic side 

1292 residues (Aller et al., 2009). Studies have also demonstrated that P-gp is a flexible 

1293 molecule that can alter its conformation in order for substrate entry. These findings 

1294 led to a proposed induced-fit mechanism for drug binding to P-gp, in which the 

1295 substrate enters the large binding pocket and both drug and protein modify their 

1296 shape to generate more favourable contacts unique to that substrate (Alonso et al., 

1297 2006). This mechanism is supported by the X-ray structure of P-gp, where each of 

1298 the ligands bound to P-gp interact with the protein at different or the same 

1299 overlapping amino acid residues. Recent site-directed mutagenesis studies have 

1300 provided evidence that each substrate can bind to more than one site and all sites 

1301 are capable of transport function (Chufan et al., 2013). 

1302 

1303 I.5.5.I.2. Multidrug Resistance-Associated Protein (MRP, ABCC Subfamily) 

1304 Multidrug resistance-associated protein consists of ABCC1, ABCC2, ABCC3, 

1305 ABCC4, ABCC5, ABCC6, ABCC10, ABCC11 and ABCC 12 (You and Morris, 

1306 2007). All of these MRPs act as efflux pumps. 

1307 Many compounds including glutathione conjugates were identified as MRPs 

1308 substrates including LTD4, S-glutathionyl 2,4-dinitrobenzene (DNP-SG), 170- 


51 



1309 glucuronosyl estradiol, lithocholyltaurine 3-sulfate, oxidized glutathione and 

1310 bilirubin glucuronosides (Jedlitschky et al., 1996). Furthennore, numerous 

1311 unconjugated amphiphilic anions are transported by ABCC1. Examples are folate 

1312 and its antimetabolite methotrexate (Hooijberg et al., 1999). Its function as a pump 

1313 for cytostatic agents, confers resistance a broad range of anti-cancer drugs. MRP1 is 

1314 mostly found in the lung, testis, kidney, and macrophages. MRP1 shares a similar 

1315 distribution pattern with MRP2, which holds the role of excretion and 

1316 detoxification of endogenous and xenobiotic anions in the bile (Nies et al., 2007). 

1317 However, localization of MRP 1 makes its role more to protect the cells from toxic 

1318 effects of endogenous and xenobiotic anions rather than excretion (Bakos and 

1319 Homolya, 2007). 

1320 ABCC2 (MRP2) is an efflux transporter which transports sulphate conjugates, 

1321 glucuronide and glutathione of many compounds and xenobiotics (Jansen et al., 

1322 1985). This transporter abundantly exists in canalicular membrane of liver and 

1323 plays crucial role in the biliary transport of anionic conjugates. Studies in mutant 

1324 rats indicated that the lack of functional MRP2 leads to deficiency in the secretion 

1325 of anionic conjugates into bile (Hosokawa et al., 1992). MRP2 has a crucial role in 

1326 the biliary secretion of many endogenous and exogenous compounds (Morikawa, et 

1327 al., 2000) and down-regulation of MRP2 expression leads to impaired biliary 

1328 excretion of amphiphilic anionic conjugates in the rat models of cholestasis 

1329 (Trauner et al., 1997). 

1330 ABCC3 (MRP3) can transport a wide range of endogenous and exogenous 

1331 substrates (mainly conjugated organic anions) to blood circulation. As shown in 

1332 Figures 1.4 and 1.6, unlike MRP2, this transporter is mostly expressed at the 

1333 basolateral membranes of liver and intestine (Ehrhardt and Kim, 2008). Studies in 

1334 mutant rats with chronic conjugated hyperbilirubinemia, which are unable to 

1335 secrete bilirubin glucuronosides into bile shows that hepatic MRP3 expression is 

1336 inducible but appears to be constitutive in other organs (Hirohashi et al., 1998; 

1337 Fernandez-Barrena et al., 2012). MRP3 may function as a “backup” transporter for 

1338 amphipathic conjugates in cholestatic conditions. It may have a role in 

1339 detoxification of hepatocytes by extruding bile acids and other conjugates into 

1340 sinusoidal blood. 
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1341 ABCC4 (MRP4) is characterized as an ATP-dependent organic anion transporter. 

1342 Nucleoside monophosphate analogues were the first substrates that were discovered 

1343 for MRP4 (Schuetz et al, 1999). In addition, transport of the prostaglandins PGE1 

1344 and PGE2 is mediated by MRP4 (Reid et al, 2003). MRP4 is acquired in 

1345 basolateral as well as in apical membrane localizations. MRP4 was found in apical 

1346 membrane of proximal tubule epithelial of human cells (van Aubel et al, 2002) and 

1347 rat kidney (Denk et al, 2004). MRP4 was demonstrated in the basolateral 

1348 membrane in human, rat and mouse hepatocytes (Denk et al, 2004 and Rius et al, 

1349 2003) (See Figures 1.4 and 1.6). 

1350 ABCC5 (MRP5), similar to MRP4 may be found either in basolateral or apical 

1351 membrane. In intact human cells, MRP5 was able to mediate efflux of the anionic 

1352 dye fluorescein diacetate with ATP consumption (McAleer et al, 1999). The 

1353 ABCC6 (MRP6) protein is detectable in liver and kidney, in the basolateral 

1354 membrane of rat (Madon et al, 2000) and in hepatocytes in human (Keppler et al, 

1355 2001). ABCC10, ABCC11, and ABCC12 are recently identified members of the 

1356 MRP family that are at relatively early stages of investigation. ABCC10 and 

1357 ABCC11 are lipophilic anion pumps that are able to confer resistance to 

1358 chemotherapeutic agents. ABCC11 is an efflux pump that is able to transport cyclic 

1359 nucleotides (Guo et al., 2003). It is also able to transport leukotriene C4 (LTC4 ), 

1360 2,4-dinitrophenyl glutathione (DNP-SG), estradiol 17-P-D-glucuronide (E217PG), 

1361 monoanionic bile salts cholyglycine and cholyltaurine, folate and antimetabolite 

1362 methotrexate, steroid sulphates E13S and DHEAS (Chen et al., 2005). In human, 

1363 ABCC11 is localized in the cerebral cortex of neurons. A recent study on 

1364 localization of ABCC proteins has shown the expression of ABCC11 in Sertoli (rat 

1365 testis cells) (Klein et al., 2014). The human genes and transmembrane helices of 

1366 ABCC 12 orientation show a high similarity to those of ABCC4 and ABCC5 

1367 (Toyoda et al., 2008; Yabuuchi et al., 2001). No functional characterization has 

1368 been reported so far for ABCC 12 (Kruh et al., 2007). 

1369 

1370 
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1371 I.5.5.I.3. Breast Cancer Resistance Protein (BCRP, ABCG2 Subfamily) 

1372 ABCG2 subfamily is another ATP-binding cassette transmembrane transporter 

1373 which transports a range of several drugs. It was first identified in MCF-7 human 

1374 breast carcinoma cells, hence the name BCRP (Doyle et al., 1998). Ross et al 

1375 (1999) postulated that BCRP may be the main transporter that causes resistance to 

1376 mitoxantrone in cancer cells (Ross et al, 1999). Exposure to mitoxantrone, 

1377 topotecan, or doxorubicin results in over-expression of the ABCG gene in mice 

1378 lacking P-gp and MRP hence the transporter is one of the three major transporters 

1379 involved in multidrug resistance (Allen et al., 1999; Doyle and Ross, 2003). BCRP 

1380 also effluxes non-chemotherapeutic drugs and xenobiotics such as prazosin, 

1381 glyburide, and 2-amino-l-methyl-6-phenylimidazo [4,5-b]pyridine (Ni et al., 2010; 

1382 Saito et al., 2010). BCRP also mediates the intestinal efflux of antibiotics. For 

1383 example nitrofurantoin which is an antibiotic used in treating urinary tract infection 

1384 has a very high biliary excretion predominantly mediated by BCRP (Merino et al, 

1385 2005b). Human BCRP and mouse bcrpl can transport a range of organic substrates, 

1386 including hydrophobic compounds, organic anions, weak bases, and conjugates of 

1387 glucuronide, sulfate, glutamylate and glutathione of many endogenous and 

1388 exogenous molecules. There is overlapping substrate specificity between BCRP and 

1389 P-gp, however, the transport efficacies for these substrates differ (Ni et al., 2010; 

1390 van Herwaarden and Schinkel, 2006). 

1391 Tissue distribution of BCRP is similar to that of P-gp; BCRP is located in the apical 

1392 membrane of epithelial cells of the intestines where it mediates direct intestinal 

1393 excretion of its substrates and in the bile canalicular membrane of hepatocytes it 

1394 stimulates hepatobiliary excretion (Allen et al., 1999). Besides, BCRP has been 

1395 shown to have protective role in blocking the absorption of drugs into CNS via the 

1396 blood-brain barrier (Loscher and Potschka, 2005). 

1397 

1398 1.6. Assessment of drug-transporter Interactions 

1399 Transporters impact on both safety and efficacy in humans. Effect of transporter 

1400 interactions on the therapeutic and other biological effects of drugs is complicated 
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1401 due to the distribution pattern of these transporters in tissues and membrane 

1402 localisations and varying, often complicated, roles in different tissue compositions. 

1403 As a result, interaction of drugs with different transporters can impact their ADME 

1404 properties and may lead to potential drug-drug interactions. In drug discovery, it is 

1405 important to identify the possible drug-drug interactions for a drug candidate and 

1406 evaluate the risk of occurrence in patient populations that are likely to receive a 

1407 concomitant medication (Koenen et al., 2011; Li, 2008). 

1408 Drug transporter interactions may be assessed using in vitro methods and they may 

1409 be estimated using in silico techniques during drug discovery (Li, 2008). Ex vivo 

1410 animal tissues have been traditionally used to measure drug permeability and 

1411 transporter mechanisms, but since emergence of human overexpressing cell lines, 

1412 these models have limited use in the industry (Obach et al., 2012). The value of 

1413 these assessments in drug development is to enable the prediction of drug-drug 

1414 interaction risk in clinical settings (Li, 2008). 

1415 The experimental study of transporters requires the transporter expressed in a 

1416 correct location of a plasma membrane (apical/basolateral) in correct orientation. 

1417 During the experiment, the disappearance of drug substance from one compartment 

1418 and/or appearance of the drug in the other compartment is/are measured. In order to 

1419 measure the inhibition of a transporter by a drug, a validated specific substrate of 

1420 that transporter is required to test the inhibitory activity against the transport of the 

1421 substrate (Keogh, 2012). For example, hepatocytes can be grown in collagen 

1422 sandwich cultures allowing them to establish the bile canaliculi necessary for 

1423 directional flux to explore the impact of inhibitors on bile acid transporters (Kotani 

1424 et al, 2011, Maeda et al., 2010, Marion et al., 2011; Nakanishi et al., 2011). In 

1425 addition to primary hepatocytes, renal proximal tubule cells (Brown et al., 2008) 

1426 and brain microvessel endothelial cells (Lippmann et al., 2012) are also used to 

1427 mimic tissue barriers. 

1428 The experimental methods can generate quantitative or semi-quantitative measures 

1429 such as binary data (substrate or non-substrate), IC 5 o, K,, Km, Vmax, efflux ratio 

1430 and intrinsic permeability. Michaelis-Menten model of enzyme kinetics are 

1431 generally used to describe the interactions with transporters (Agnani et al., 2011; 

1432 Kolhatkar and Polli, 2010). Dissociation or association constant from the inhibitor- 
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1433 enzyme complex and the concentration of the inhibitor to cause 50% inhibition at 

1434 one chosen substrate concentration (IC 50 ) are some of the most common ways to 

1435 present enzyme inhibition data (Li, 2008). 

1436 IC 50 is defined as the required concentration of an inhibitor to inhibit the enzyme 

1437 population by half (Copeland, 2005). IC 50 can also be calculated from inhibitor 

1438 concentrations and percentage of control activity using some non-linear regression 

1439 methods (Chiba et al., 2001). Typically, enzymes and IC 50 determinations for the 

1440 enzymes and transporters occur in early stage of preclinical development in order to 

1441 generate preliminary inhibition data on a large set of compounds across a broad set 

1442 of enzymes (Yan and Caldwell, 2001; Crespi and Stresser, 2000). However it must 

1443 be noted that IC 50 values can vary depending on the substrate used, the 

1444 concentration of the labelled ligand (substrate) and different experimental variables 

1445 and conditions (Bohm and Schneider, 2003). An advantage of IC 50 detennination is 

1446 that it is independent of the inhibition mechanism and needs fewer samples to 

1447 produce a meaningful result (Krishna, 2004). Nevertheless, the IC 50 determination 

1448 is dependent on the experimental and incubation conditions under which they are 

1449 measured (Madan et al., 2002). Thus, IC 50 value is only meaningful at the substrate 

1450 concentration for which the IC 50 was determined for all forms of inhibition. 

1451 Depending on the concentration of substrate used in the preliminary IC 50 

1452 experiment, there can be a correlation between the IC 50 and the inhibition constant 

1453 (K;) which can be used as an early approximation of K; (Krishna, 2004). 


1454 Inhibition constant (K.) plays an important role in predicting the clinical 

1455 significance of inhibitions in in vitro methods. The K, is a measure of enzyme- 

1456 inhibitor potency and indicates how potent an inhibitor is. It is the concentration 

1457 required to produce half maximum inhibition. In contract to the IC 50 value, K; is 

1458 more reproducible because they are less dependent on experimental conditions as 

1459 they are measured based on a range of substrate-inhibitor concentration (Krishna, 

1460 2004). IC 50 value can be is converted to an absolute inhibition constant K; by the 

1461 Cheng-Prusoff equation. For enzymatic reactions, this equation is: 


1462 


Ki 


IC50 


1 + 


_[s]_ 

Km 


Eq. 1.12 
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1463 Where K; presents dissociation constant of the inhibitor, [S] is fixed substrate 

1464 concentration and K m is the concentration of substrate at which enzyme activity is 

1465 at half maximal (Cheng and Prusoff, 1973). In theory, a larger K; value is an 

1466 indication of low affinity and vice versa. For example in P-gp inhibition, the small 

1467 K; value means that substrate strongly blocked the P-gp and also means that 

1468 enzyme-substrate complex (E-S) is more stable. 

1469 Although lab-to-lab variability is a well-established phenomenon for many 

1470 experimental measurements, this may well be more pronounced for transporter 

1471 assays using live cells, as many variables will impact on assay outputs including 

1472 expression levels of the transporter, potentially endogenous transporters, passage 

1473 number, assay fonnats (Keogh, 2012). A recent cross-pharma comparison of 

1474 quantitative in vitro P-gp inhibition assays using a common substrate digoxin, with 

1475 Caco-2, MDCK-MDR1 or P-gp vesicles, several assay end points, and data 

1476 calculation methods showed limited agreement between assay outputs (Lee, 2011). 

1477 The sources of variability are multi-factorial including cell-type, assay fonnat and 

1478 data manipulation (Bentz et al, 2013). 

1479 For robust and reproducible in vitro transporter inhibition investigation, there is a 

1480 need for characterised probe substrate(s) and inhibitors to detennine the transport 

1481 kinetic parameters such as initial rates, K m , V max , IC 50 or K,. In binary (yes or no) 

1482 assays, there is a need for a single probe substrate concentration at or below K m , 

1483 with and without inhibitors at concentrations sufficient to cause complete 

1484 inhibition. 

1485 Although animals provide important in vivo mechanistic insights for transporters, 

1486 their utility is limited, due to low throughput, the expense, and more importantly, 

1487 the interspecies differences in transporter tissue distribution, expression levels and 

1488 metabolism which limits the direct translation from preclinical species to humans 

1489 (Obach et al., 2012; Koegh, 2012). 

1490 
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1491 1.7. In silico Methods in Drug Discovery 

1492 Traditionally, drugs are usually discovered in biological assays and in time- 

1493 consuming in vivo and in vitro testing. However, the use of computer modelling in 

1494 drug discovery has rapidly been developed creating techniques and software that 

1495 are able to analyse and predict infonnation about biological, chemical and medical 

1496 data. The term ‘m silico ’ refers to the computational approach of drug discovery 

1497 which is complementary to in vivo and in vitro experiments (Ekins et al., 2007). In 

1498 a widely expanding field, in silico techniques have been used to create virtual 

1499 models that enable scientists to make predictions about biological activity and 

1500 provide advances in medicine. Computational methods are used widely in drug 

1501 discovery for the design of virtual compound libraries, identification of lead 

1502 compounds (virtual screening), development of 3-D homology models for the 

1503 biological targets, computing the interaction energies and geometries (protein- 

1504 ligands docking), protein-protein interactions and estimations of biological activity 

1505 of choice (Ekins et al., 2002a). For example, quantitative structure-activity 

1506 relationship (QSAR) has been applied for the analysis of growing collections of 

1507 ADME data and the resulting models are used for the prediction of properties of 

1508 new bioactive compounds (Golbraikh et al., 2014). 

1509 In drug discovery, the use of computational methods to facilitate the discovery 

1510 process is well established and plays an important role in modem drug discovery 

1511 (Krogsgaard-Larsen et al., 2010). Other commonly used in silico methods involve 

1512 phannacophore modelling that uses 3D structure representations to describe how 

1513 candidate ligands may bind to a target (Ekins et al., 2007). In addition, there are 

1514 target based methods that include docking compounds to a target site and the use of 

1515 scoring functions to score the binding affinity of the ligand to the target. It has 

1516 gained popularity in recent times and has been involved in the discovery of 

1517 inhibitors of HIV-1 integrase (Hayouka et al., 2010). 

1518 
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1519 1.7.1. Quantitative Structure-Activity Relationships (QSAR) 

1520 Since the 1960s when it was introduced by Corwin Hansch, QSAR has been used to 

1521 describe the mathematical relationship between the structure of a molecule and 

1522 biological activity (Van de Waterbeemd and Rose, 2003). QSAR models are 

1523 empirical models in which a quantitative description of a chemical structure is 

1524 related to the biological activity through an algorithm to guide future drug design 

1525 (Gumming et al., 2013). The predictive ability of QSAR models is directly 

1526 influenced by dataset characteristics such as size and chemical diversity as well as 

1527 employing different molecular modelling techniques, molecular descriptors, and 

1528 statistical model development methods (Golbraikh et al., 2014) and a thorough 

1529 validation of the model for future predictions (Gramatica, 2013). 

1530 QSAR and other computer based methods can significantly reduce the time and the 

1531 cost in drug design and discovery processes. Regression models in QSAR relate a 

1532 set of predictor variables to the numerical potency of the response variable, while a 

1533 classification algorithm relates the predictor variables to a categorical value of the 

1534 response variable. The predictors consist of physicochemical and molecular 

1535 properties of compounds and the QSAR response could be a biological activity of 

1536 the compounds (Nantasenamat et al., 2010). 

1537 The ability to predict a pharmacological activity is important. Predictive models are 

1538 based on the given data, the technique to develop the model and the quality of 

1539 infonnation of the dataset. An ideal QSAR model should be simply understandable, 

1540 interpretable and mechanistically relevant (Cronin et al., 2010). A simple model 

1541 should have a very small number of descriptors to form the relationship with the 

1542 dependent. In QSAR, information and particular effect from molecular structure in 

1543 a biological system can help us understand the relationship of molecular structure in 

1544 a biological system (Cronin et al., 2010). 

1545 

1546 1.7.1.1. Molecular Descriptors 

1547 The manipulation and analysis of chemical structural information is made possible 

1548 through the use molecular descriptors (Leach and Gillet, 2003). According to Hong 
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1549 et al. 2008, “Molecular descriptors are used to extract the structural information in 

1550 the fonn of numerical or digital representation that is suitable for model 

1551 development, serving as the bridge between the molecular structures and 

1552 physicochemical properties or biological activities of chemicals”. Molecular 

1553 descriptors on a more mathematical based has been described by Todeschini and 

1554 Consonni: “The molecular descriptor is the final result of a logic and mathematical 

1555 procedure which transforms chemical information encoded within a symbolic 

1556 representation of a molecule into a useful number or the result of some standardized 

1557 experiment” (Todeschini and Consonni, 2008). 

1558 Molecular descriptors play an essential role in chemistry and phannaceutical 

1559 sciences. Molecular descriptors are commonly used in QSAR for the identification 

1560 and unique representation of molecules and fragments which are likely to become 

1561 drug candidates (Malik et al., 2006). Descriptors encode or map the structure of 

1562 molecules into a set of numerical or binary values representing various molecular 

1563 properties which explains activity (Dudek et al., 2006). 

1564 Molecular descriptors are classified based on the compounds physiochemical 

1565 property, topology, kappa shape indices, molecular finger prints, and 

1566 pharmacophore keys (Dudek et al. 2006). The infonnation contained in a molecular 

1567 descriptor about a compound depends on the fonnat in which the chemical is 

1568 represented. This could either be a one-, two-, or three dimensional representations. 

1569 One-dimensional (ID) descriptors represent mainly the molecular fonnula of the 

1570 compound and describe only the bulk properties of the compound such as its 

1571 molecular weight and number of specific atoms. Descriptors based on two- 

1572 dimensional (2D) representations are able to provide information regarding atom 

1573 types, connectivity patterns and topology such as number of aromatic group, 

1574 number of hydrogen bond donors and acceptors, molecular refractivity, number of 

1575 rotatable single bonds, bond distance and branching. 3D descriptors are more 

1576 complex and provide infonnation on conformation, geometry, potential energy such 

1577 as dipole moment, ionisation potential, solvent accessible area, bond energy and 

1578 solvation energy (Hong et al., 2008). 

1579 
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1580 1.7.1.1.1. 2D Molecular Descriptors 

1581 2D molecular descriptors are defined as numerical properties that can be calculated 

1582 from the connectivity matrix, i.e. connection table representation, of a molecule but 

1583 not from atomic coordinates. Therefore, the 2D descriptors are not dependent on the 

1584 molecular confonnation. As a result of this, they can be calculated quickly without 

1585 the need for the optimisation of the three dimensional structures and are most 

1586 suitable for large database studies. They can include physical properties such as 

1587 sum of fonnal charges, bond counts, molecular connectivity and shape indexes 

1588 (Hall and Kier, 2007), adjacency and distance matrix descriptors (Mihalic et al., 

1589 1992), pharmacophore feature descriptors and partial charge descriptors. 

1590 Examples of 2D molecular descriptors provided by MOE software (Chemical 

1591 Computing Group Inc. Montreal, Canada) include the van der Waals surface area 

1592 calculated using a connection table approximation from 2D structure (vdw area), 

1593 octanol/water partition coefficient (log P), molecular mass density (density), sum of 

1594 formal charges (Fcharge) and sum of the atomic polarisabilities (apol). The number 

1595 of rings (rings), Lipinski’s drug like test (Lipinski et al., 2001) (lip_druglike), and 

1596 number of aromatic bonds (b ar) are examples of simple count descriptors, which 

1597 are considered as 2D descriptors as they require 2D atomic connection map. 

1598 The Kier and Hall connectivity (chi, y) and shape (kappa, k) indices are topological 

1599 descriptors calculated from the hydrogen suppressed molecular graph (Hall and 

1600 Kier, 1977; 2007). In addition, based on the same graph theory, the atom type 

1601 electrotopological state indexes were suggested. These are atom level indexes that 

1602 combine the electronic character of the atoms and the topological environment for 

1603 each skeletal atom in a molecule (Kier and Hall, 1999). 

1604 Some 2D descriptors are calculated from adjacency or distance matrices. The 

1605 elements of an adjacency matrix for a molecule take the value of one if the two 

1606 atoms are bonded and zero otherwise. The elements of a distance matrix of a 

1607 chemical structure are the length of the shortest path between the two atoms. An 

1608 example of descriptors calculated from adjacency matrix is BCUT descriptors 

1609 (Pearlman and Smith, 1997). The BCUT descriptors are calculated from the 

1610 eigenvalues of a modified adjacency matrix and are extensions of parameters 
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1611 originally developed by Burden (1989). These parameters are based on a 

1612 combination of the atomic feature for each atom and a description of the nominal 

1613 bond-type for adjacent and nonadjacent atoms (Stanton, 1999). 

1614 Atomic partial charges can be combined by a variety of methods to calculate 

1615 molecule level properties (descriptors). For example, total of all the negative atomic 

1616 charges, or the sum of absolute charges can be calculated for a molecule to 

1617 represent polarity of the molecule. In addition, van der Waals surface area of atoms 

1618 with specific atomic charge ranges can be summed. An example of this is fractional 

1619 positive van der Waals surface area (PEOEVSAFPOS) that can be calculated by 

1620 MOE software (MOE Help file, 2012). 

1621 

1622 1.7.1.1.2. 3D Molecular Descriptors 

1623 3D descriptors are also known as shape-based descriptors as they depend on 

1624 internal coordinates, conformation and three dimensional structure of the molecule. 

1625 Such descriptors can be as simple as inter-atomic distances or torsion angles or as 

1626 complex as the distribution of electrostatic potential around a molecule. Also 

1627 similarity descriptors, allow comparison of the similarity of a molecule with a set of 

1628 standard active molecules, on the bases of either electrostatic potential or steric 

1629 parameters (Dearden and Cronin, 2005). An example of such molecular descriptors 

1630 is dipole moment, which is controlled by the atomic charges, connection of atoms, 

1631 and the three dimensional shape (internal coordinates) of the molecule. These 

1632 computed 3D descriptors correlate well with the well-known experimentally 

1633 observed physicochemical properties such as solubility (Kombo et al., 2013). 

1634 Due the importance of the 3D shape, molecular structures need to be optimized 

1635 (energy minimization) before the calculation of these descriptors (Akamatsu, 2002). 

1636 Molecular orbital descriptors calculated by MOP AC are examples of these 

1637 descriptors (Karelson et al., 1996). Surface area, molar volume and shape 

1638 descriptors and conformation dependent charge descriptors are other molecular 

1639 descriptors that are dependent on the 3D shapes of molecules (Sauer and Schwarz, 

1640 2003). 
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1641 Volsurf descriptors (known as the vsurf descriptors within the MOE program) were 

1642 developed by Cruciani and co-workers (Cruciani et al., 2000a) and noted as an 

1643 important class of descriptors for the prediction of phannacokinetic properties 

1644 (Cruciani et al., 2000b). These descriptors are calculated from 3D molecular fields 

1645 of interaction energies also known as GRID (Goodford, 1985) molecular fields. In 

1646 mathematical terms, these are 3D matrixes where the elements of the matrices are 

1647 the attractive and repulsive forces between an interacting partner and a target. To 

1648 calculate the Volsurf (and other molecular field) parameters, software first 

1649 computes the fields by placing each molecule into a rectangular 3D grid (Leach and 

1650 Gillet, 2003). Then a probe group is placed at each grid vertex and interaction 

1651 energy between the probe and the molecule at points around the molecule is 

1652 calculated (Goodford, 1985). For instance, MOE software calculates a parameter 

1653 called vsurf HB, which is calculated using a probe called O (carbonylic oxygen) to 

1654 generate 3D H-bond donor fields (Fortuna et al., 2008). The H-bond donor regions 

1655 may be defined as the molecular envelope generating attractive H-bond donor 

1656 interactions. H-bond donor descriptors can be calculated at different energy levels. 

1657 Other 3D molecular descriptors include electrostatic (E ele) and van der Waals 

1658 (E_vdw) components of the potential energy which can be calculated by 

1659 semiempirical methods such as those implemented in the MOP AC engine in MOE 

1660 software. The dipole moment (AMl dipole), and the energy of the Highest 

1661 occupied and the Lowest Unoccupied Molecular Orbitals (AMI HOMO and 

1662 AM1LUMO respectively) are the examples of MOP AC descriptors that are 

1663 calculated by AMI semiempirical method (Stewart, 1993). 

1664 

1665 1.7.1.2. QSAR Model Development and Validation 

1666 QSAR models are statistically significant relationships between a biological 

1667 property and molecular parameters of a set of compounds. The theoretical basis of 

1668 classical QSAR is that the molecular structure is responsible for all the properties 

1669 and biological activities of compounds and similar compounds should have similar 

1670 biological and physicochemical properties (Katritzky et al, 2001). Building a 

1671 model that fits the available data is not adequate as the aim of any modelling 
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1672 procedure is to be able to use the models for making future predictions. According 

1673 to Gramatica (2011) ‘an ideal QSAR should: 1) consider an adequate number of 

1674 molecules for sufficient statistical representation, 2) have a wide range of quantified 

1675 end-point potency (i.e. several orders of magnitude) for regression models or 

1676 adequate distribution of molecules in each class (i.e. active and inactive) for 

1677 classification models, 3) be applicable for reliable predictions of new chemicals 

1678 (validation and applicability domain) and 4) allow to obtain mechanistic 

1679 information on the modelled end-point.’ Apart from aforementioned criteria for an 

1680 ideal QSAR model, OECD principles for QSAR model validation (OECD 

1681 Guidelines, 2004) may also be used to establish recognized rules for the use of 

1682 QSAR predictions in regulation. 

1683 

1684 1.7.1.2.1, Statistical Modeling Techniques 

1685 A wide range of statistical techniques have been applied to the QSAR field. These 

1686 can be classified based on the type of the data being modelled. Categorical data, 

1687 such as the binary data types substrate/non-substrate or active/inactive, can be 

1688 modelled using classification techniques that utilise the molecular descriptors in 

1689 order to divide the data into the respective classes (Han and Kamber, 2006). 

1690 Continuous data such as IC 50 values can be subjected to prediction methods. 

1691 Prediction methods, also known as regression-based methods, are used to predict 

1692 missing or unavailable numerical data values rather than class labels (Han and 

1693 Kamber, 2006). Among the regression-based approaches, the methods of multiple 

1694 linear regression (MLR) and partial least squares (PLS) regression are prime 

1695 examples in the QSAR field, while examples of classification methods involve, 

1696 discriminant analysis and classification decision trees and support vector machines 

1697 (Eriksson et al., 2003). 

1698 Classification and prediction may need to be preceded by ‘relevance analysis’, 

1699 which attempts to identify attributes that do not contribute to the classification or 

1700 prediction process. These attributes can then be excluded. The commonly used 

1701 terminology for this analysis in QSAR field is feature selection (Newby et al., 

1702 2013a) or variable selection (Ghafourian and Cronin, 2006), or data reduction 

1703 (Livingston, 2004). Due to the large numbers of molecular descriptors that are 
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1704 available through many commercially available software packages, variable 

1705 selection has become a necessity in QSAR model development. This practice is 

1706 essential to avoid overfitting to the training set data and the risk of chance 

1707 correlation (Ghafourian and Cronin, 2006). In addition, fewer molecular descriptors 

1708 increase interpretability and understanding of resulting models (Weaver, 2004) and 

1709 it can provide improved model perfonnance for the prediction of new compounds 

1710 (Norinder, 2003). Recently, ‘descriptor pharmacophore’ was introduced as a new 

1711 concept in QSAR on the basis of variable selection. The descriptor pharmacophore 

1712 is defined as a subset of molecular descriptors that lead to the most statistically 

1713 significant QSAR models. It has been demonstrated that chemical similarity 

1714 searches using descriptor phannacophores as opposed to using all descriptors is 

1715 more effective in successful mining of chemical databases or virtual libraries for 

1716 identification of compounds with desired biological activity (Tropsha et al, 1999; 

1717 Tropsha and Zheng, 2001). Feature selection can be split into two broad categories: 

1718 data pre-processing or embedded methods. Data pre-processing feature selection 

1719 involves reduction of the number of molecular descriptors prior to incorporating 

1720 them in the model development exercise. On the other hand, embedded methods 

1721 incorporate the feature selection into the training of the model (Saeys et al., 2007). 

1722 There are some unsupervised feature selection methods that do not use the 

1723 dependent variable in the process of data reduction. An example of these methods, 

1724 which can be used at pre-processing stage, is clustering of the variables. Cluster 

1725 analysis is a useful tool for the visualisation of the clusters of variables as well as 

1726 clusters of compounds (Livingstone, 2004). Another unsupervised method is 

1727 Principle Component Analysis (PCA). This is multivariate technique in which a 

1728 new set of variables called Principle Components (PCs) are created from linear 

1729 combinations of original variables. PCs are orthogonal to each other and the first 

1730 PC has the maximum infonnation (variance) of the original data. Subsequent PCs 

1731 describe the maximum of the remaining variance (Livingston, 2004). In this way, 

1732 only the first few new variables (PCs) will be sufficient to explain the data and the 

1733 remaining variables can be discarded, hence data reduction. 

1734 Other pre-processing techniques can be further split into filter and wrapper 

1735 techniques. Filter techniques usually involve calculating a relative score of the 
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1736 molecular descriptors and ranking them in order of best score, and the descriptors 

1737 that are at the top of the list are then used as input for classification. Wrapper 

1738 techniques consider a number of subsets of molecular descriptors, evaluate each of 

1739 these based on the predictive perfonnance of a classification model built from that 

1740 descriptor subset, and eventually select the descriptor subset with the best 

1741 predictive perfonnance (Kohavi and John, 1997). 

1742 To have a successful QSAR model, depends on accuracy of the input data and 

1743 selection of appropriate descriptors should be considered (Chirico and Gramatica, 

1744 2012; Roy, 2007). 

1745 

1746 I.7.I.2.2. Validation of QSAR Models 

1747 The best fit models may not be the best ones for prediction. Only a stable and 

1748 predictive model can be usefully interpreted for its mechanistic meaning, even 

1749 though this is not always easy or feasible (Gramatica, 2011). The use of these 

1750 statistical techniques in this context leads to ‘statistical learning’ from data that can 

1751 be used for predictions. So far, much effort has been placed into performing some 

1752 form of validation on QSAR models. Usually, this has been in terms of a model’s 

1753 statistical fit and more recently the focus has turned to using an external test set 

1754 (Cronin, 2010). 

1755 Various strategies can be used for validation of QSAR models. According to Wold 

1756 and Eriksson (1995) the most important validation strategies are: 1. internal 

1757 validation set or a standard cross-validation method, 2. external validation by 

1758 splitting the dataset into training set for model development and to evaluate the 

1759 predictive ability of the model, 3. blind external validation (by using the model on a 

1760 new external set), 4. data randomisation or Y-scrambling for verifying the absence 

1761 of chance correlation between the dependent variable and descriptors (Wold and 

1762 Eriksson, 1995). 

1763 The general idea of V-fold cross-validation is to divide the overall sample into a 

1764 number of subgroups (V-folds). Subgroups are removed from the training set one at 

1765 a time to serve as the internal test set and the model is developed successively for 
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1766 the remaining compounds (V - 1 folds). For each modelling run, some index of 

1767 predictive validity is computed for the subgroup that is left out and the results of the 

1768 v replications are averaged to yield a single measure of the stability of the 

1769 respective model. The V-fold cross-validation technique is used in various 

1770 analytical procedures to avoid overfitting of the data (Burden, 1989). V-fold cross 

1771 validation is especially useful when the data is not large enough to allow for 

1772 external validation of the model. The leave-one-out (LOO) method can be 

1773 considered as a special case of V-fold cross validation. The outcome of this 

1774 procedure is cross-validated R (q"), which is may regarded as a criterion of both 

1775 robustness and predictive ability of the model. The robustness of LOO procedure 

1776 has been debated recently (Kubinyi et al, 1998; Golbraikh et al, 2003). 

1777 Y-randomization is a widely used approach in validation of QSARs which is often 

1778 used along with the cross-validation (Golbraikh et al, 2003). It consists of 

1779 repeating the model calculation procedure with randomized activities and 

1780 subsequent probability assessment of the resultant statistics (Golbraikh et al, 

1781 2003). 

1782 A more robust way for validation is to use external validation by splitting the 

1783 dataset into training set, for model development, and validation set, to evaluate the 

1784 predictive ability of the model. This is done before building the models so the 

1785 validation set is kept external and not involved at any stage of model development. 

1786 There are different methods for splitting the data into training and validation sets. It 

1787 has been suggested that splitting data should be performed in a way that all 

1788 representative compounds of the validation set are close to the training set 

1789 compounds in the multidimensional descriptor space, and the representative points 

1790 of the training set must be distributed within the whole area occupied by the entire 

1791 dataset (Golbraikh and Tropsha, 2002.). The rational division of a dataset into 

1792 training and test sets can be done by randomly allocating a fixed proportion of a 

1793 homogeneous dataset to the validation set. In order for the training and validation 

1794 sets covering similar activity ranges, the data could be ranked according to the 

1795 magnitude of the biological response, and every third or fourth chemical could be 

1796 removed for validation set (Sharifi and Ghafourian, 2014). Other selection methods 

1797 include selection on the basis of relevant physicochemical descriptors for example 
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1798 through multivariate design; this results in a test series of compounds in which all 

1799 major structural and chemical properties are systematically varied at the same time 

1800 (Eriksson et al., 2003). An example of the other methods that can ensure similar 

1801 distribution of training and validation set data is K-means-cluster based division of 

1802 training and prediction sets (Leonard and Roy, 2008). 

1803 

1804 I.7.I.2.2.I. Applicability Domain 

1805 It is usually noted that QSAR is applicable only to compounds that are similar to 

1806 the training set compounds (Katritzky et al., 2001). Structurally limited training 

1807 sets, when the dataset is small or when the chemical diversity is low, are a 

1808 limitation of QSAR models in terms of their application for future predictions 

1809 (Dimitrov et al., 2005). A good model performance on the training set does not 

1810 guarantee that a model will be predictive for validation set or external compounds 

1811 (Stouch et al., 2003). In other words, QSAR models sometimes are not applicable 

1812 to the new compounds. As a result of this, there needs to be conditions set for the 

1813 applicability of QSAR models (Eriksson et al., 2003). This is very important in 

1814 light of the increasing number of commonly termed global QSAR models which 

1815 can be built on small datasets of low diversity (Weaver and Gleeson, 2008), or with 

1816 poorly homogeneous training sets that contain partially overlapping clusters of 

1817 compounds e.g. several classes of chemical compounds or chemotypes (Eriksson et 

1818 al, 2003). Defining a model’s applicability domain is essential in order to 

1819 detennine the space of chemical structures that could be predicted reliably. 

1820 According to Weaver and Gleeson (2008) the domain of applicability is an 

1821 important concept in quantitative structure-activity relationships (QSAR) that 

1822 allows one to estimate the uncertainty in the prediction of a particular molecule 

1823 based on how similar it is to the compounds used to build the model. In practice, 

1824 there are various methods available for determining the range of applicability of 

1825 QSAR models. For example, Dimitrov et al (2005) utilized a stepwise approach for 

1826 detennining the applicability domain of QSAR models based on physicochemical 

1827 properties in the training set of toxicity and skin sensitization datasets. This method 

1828 involved four stages to account for the diversity and complexity of the QSAR 
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1829 models. First, the range of variation of the physicochemical properties of the 

1830 training set compounds was specified. Then the structural similarities between 

1831 chemicals that are correctly predicted by the model were assessed. At the third 

1832 stage, the domain was defined based on a mechanistic understanding of the 

1833 modelled phenomenon. Finally, the reliability of simulated metabolism was 

1834 considered in assessing the reliability of predictions, if metabolic activation of 

1835 chemicals is a part of the (Q)SAR model (Dimitrov et al., 2005). 

1836 Sahigara et al (2012) has reviewed the applicability domain methods (Sahigara et 

1837 al., 2012). Accordingly, they have classified all the methods into: 1. range-based 

1838 and geometric methods; 2. distance-based methods; 3. probability density 

1839 distribution-based methods; 4. other approaches that may include decision trees and 

1840 decision forests approach and stepwise approaches, such as the method suggested 

1841 by Dimitrov et al (2005). Range based methods are the simplest approaches which 

1842 may use a ‘bounding box’ defined on the basis of maximum and minimum values 

1843 of each descriptor used to build the model or principle components of PCA 

1844 (Netzeva et al., 2005). In distance based methods, first the distance between an 

1845 individual molecule will be computed from a defined point within the descriptor 

1846 space of the training data using common distance measures e.g. Euclidean distance. 

1847 Then, a threshold is applied to separate the compounds that are outside the domain 

1848 of applicability. The threshold is a user defined parameter (Xu and Gao, 2003). As 

1849 a distance based method, k nearest neighbour method can be used to measure the 

1850 similarity by calculating the distance between the compound and the nearest 

1851 neighbour compound in the training set (Xu and Gao, 2003). Probability density 

1852 distribution-based methods are some of the most advanced approaches for defining 

1853 applicability domain, as they are able to identify the internal empty regions within 

1854 the data. 

1855 

1856 1.7.2. Enzyme-ligand Docking 

1857 Availability of a detailed 3D structure for biological drug targets (mainly receptors 

1858 or enzymes) opens the possibility of a number of computer-based techniques in 

1859 drug discovery arena. Structure-based drug design is one such technique that can 
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1860 use the information regarding shape and properties of the binding site of target 

1861 molecules to design compounds which possess corresponding properties for fitting 

1862 into and interacting with the binding site. Therefore, we require methods for 

1863 determination of 3D structure of the biological targets (Krogsgaard-Larsen et al, 

1864 2010). Other target based methods involve docking compounds to a target site and 

1865 the use of scoring functions to score the binding affinity of the ligand to the target. 

1866 It has gained popularity in recent times and has been involved in the discovery of 

1867 inhibitors of HIV-1 integrase (Hayouka et al, 2010) and aldose reductase inhibitors 

1868 (Iwata et al, 2001). Enzyme-ligand docking may guide a target’s structural 

1869 requirements for ligand (e.g. substrate/inhibitor) interaction by correlating the 

1870 molecular features of validated ligands with their biological activity (Matsson et al, 

1871 2007; Nicolle et al, 2009; Ahlin et al, 2008; Gombar et al, 2004). The 3D 

1872 structure of a protein can be obtained by prevalent methods such as X-ray 

1873 crystallography and NMR spectroscopy, or predicted by homology modeling 

1874 methods. The quality of an X-ray structure or a homology model is an important 

1875 factor that should be taken into consideration before using the protein (Krogsgaard- 

1876 Larsen et al, 2010). 

1877 

1878 1.7.2.1. Conceptual Frame and Methodology of Molecular Docking 

1879 Computational approaches establish enzyme-ligand binding affinities by using 

1880 structural infonnation of the ligand and target enzyme, thus reducing the time and 

1881 materials associated with experiments (Guvench and Mackerell, 2008). After X-ray 

1882 crystallography or multidimensional NMR studies, the solved 3D structures of 

1883 proteins are deposited into the Protein Data Rank (PDB) (RCSB Protein Data Bank, 

1884 2014). These structures can be analysed to discover the essential interactions and 

1885 principles of molecular recognition (Raffa, 2001). The forces of interaction that 

1886 bind a substrate to the enzyme active site consist of ionic bonds, hydrogen bonds, 

1887 van der Waals, hydrophobic, dipole-dipole and ion-dipole interactions. Once the 

1888 interactions involved in substrate binding have been established, it is possible to 

1889 look at the structure of a substrate and hypothesize the probable interaction that it 

1890 will have with its active site (Schmidt et al., 2013). The docking process involves 

1891 the prediction of ligand confonnation and orientation (or posing) within a targeted 
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1892 binding site. In general, there are two aims of docking studies: accurate structural 

1893 modelling and correct prediction of activity (Kitchen et al., 2004). Docking studies 

1894 can be used to identify the fit between active site of the enzyme and the potential 

1895 ligand. Also, docking can be used as a component of virtual screening, where a 

1896 database of ligands is screened against a target protein (Kitchen et al. , 2004). 

1897 The docking process consists of two elements, the first is searches to find suitable 

1898 confonnation and the second is the measurement of the affinity of various 

1899 confonnations (Dror et al., 2004). The process begins with the application of 

1900 docking algorithms that positions small molecules in the active site. However, even 

1901 relatively simple organic molecules can contain several confonnational degrees of 

1902 freedom. Confonnational analysis is earned out to recognise conformational 

1903 characteristic of ligand 3D structure created by energy minimization (Secundo, 

1904 2013). Energy minimization reduces the potential energy of a given confonnation 

1905 to make it suitable, but the obtained structure might not be essentially the most 

1906 stable one as energy minimization stops when it reaches the first stable structure 

1907 (the local minimum). To achieve the minimum with the lowest energy, structural 

1908 variations will need to be carried out which helps in reaching the most stable 

1909 confonnation. In protein ligand docking, the docking program aims to find the 

1910 prefened conformation of the ligand at a binding site of the target (Sousa et al., 

1911 2006). Sampling of different conformations must be performed with sufficient 

1912 accuracy to identify the conformation that best matches the receptor structure, and 

1913 must be fast enough to permit the evaluation of thousands of compounds in a given 

1914 docking run. The binding energy is then calculated for each conformation and is 

1915 ranked and scored to give an estimation of the binding affinity between a 

1916 compound and the target. Scoring functions are designed to predict the biological 

1917 activity through the evaluation of interactions between compounds and potential 

1918 targets (Kitchen et al., 2004). 

1919 At present, there is a wide range of docking software available in the market with 

1920 different scoring functions. The program AUTODOCK is one of the most cited 

1921 docking programs and uses the Lamarckian genetic algorithm as well as a 

1922 traditional genetic algorithm (Sousa et al., 2006). GOLD is another program that is 

1923 popular in the field and enables flexibility of the protein hydrogen bonds as well as 
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1924 the ligand being tested. Unlike AUTODOCK, docking scores in GOLD are ranked 

1925 using a force field scoring function that includes the contributions of hydrophobic 

1926 interactions, van der Waals forces and number of hydrogen bonds (Cummings et 

1927 al., 2005). FlexX is another software package that pennits protein flexibility and 

1928 scores the final position of molecules using the empirical Bohm’s scoring function 

1929 (Sousa et al., 2006). In addition to these aforementioned programs, the Molecular 

1930 Operating Environment (MOE) is a suite of applications that can be used for 

1931 medicinal chemistry purposes. It includes a docking tool that searches for 

1932 complimentary binding poses between a ligand and a rigid receptor which can be 

1933 used to detennine interactions between candidate ligands and targets. 

1934 

1935 1.7.2.2. Scoring Functions 

1936 Scoring functions are used to calculate the binding energy of poses generated after 

1937 docking placements. A very accurate scoring function is desired to be able to 

1938 successfully predict binding affinity, however due to the complexity and high 

1939 computational cost involved, scoring functions make assumptions about molecular 

1940 interactions based on experimental data from independent reactions (Lipkowitz and 

1941 Boyd, 2002). In all scoring functions, a lower score indicates a more favourable 

1942 pose while higher scores suggest that binding is less likely. Scoring functions are 

1943 based on different calculation methods and can be divided into three categories: 

1944 knowledge-based, force field and empirical based methods. 

1945 Knowledge-based functions use data from statistical analysis of structural 

1946 complexes in the protein data bank, to estimate interatomic reactions occurring 

1947 frequently between a ligand and the protein in specified intervals (Schulz-Gasch 

1948 and Stahl, 2004). 

1949 GoldScore, Assisted Model Building and Energy Refinement (AMBER) and the 

1950 Optimised Potentials for Liquid Simulations function (OPLS), are examples of 

1951 force-field scoring functions. Force-field scores are calculated by measuring 

1952 electrostatic and van der Waals interactions (Schulz-Gasch and Stahl, 2004) but are 

1953 limited by the exclusion of solvation and entropic properties (Sousa et al., 2006). 
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In contrast to these two scoring functions, empirical scores estimate free binding 
energy based on a sum of localised independent reactions (Lipkowitz and Boyd, 
2002). In most cases, the constants in empirical formulas are derived from binding 
energies calculated in experiments of receptor-ligand complexes (Sousa et al, 
2006). An example of an empirical scoring function is the London dG scoring 
utilised in MOE (Equation 1.13). 



h—bonds 


metal—lig 


atomsi 


Equation 1.13. London dG Scoring Function (Corbeil et al., 2012) 

The formula above calculates binding energy, where E flex represents the energy due 
to loss of flexibility of the ligand, /hb and Chb are measurements of hydrogen bonds, 
while C M , /m measure energies related to metal ligation, LdG is Generalized-Born 
volume integral/weighted surface area which is a scoring function in MOE software 
and c represents the average gain/loss of rotational and translational entropy. 

Early scoring functions evaluated compound fits. Relatively simple scoring 
functions, on the basis of approximate shape and electrostatic complementarities, 
are heavily used during the early stages of docking simulations and in virtual 
screening of compounds. The selected conformers can then be further evaluated 
using more complex scoring schemes with more detailed treatment of electrostatic 
and van der Waals interactions, and inclusion of at least some solvation or entropic 
effects (Gohlke and Klebe, 2002). 
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1976 2. Aims and Objectives 

1977 

1978 Biliary excretion is one of the main elimination routes of compounds and/or their 

1979 metabolites with consequent effects on drug half-life and possible implications on 

1980 gastro-hepatic cycle. The prediction of biliary excretion is a key target in the drug 

1981 design and it helps with the selection of candidates for the development stage. The 

1982 broad aim of the project involved not only the use of Quantitative Structure- 

1983 Activity Relationships (QSAR) and data mining tools for estimation of biliary 

1984 excretion, but also investigating the role of several transporter proteins in this 

1985 elimination route. In this investigation, the aim was to use a combination of various 

1986 available methods in order to achieve the best predictive models. The methods 

1987 included stepwise regression analysis, Classification and Regression Trees 

1988 (C&RT), Chi-square Automatic Interaction Detector (CHAID), Boosted trees (BT), 

1989 Random Forest (RF) and Multivariate Adaptive Regression Splines (MARS) 

1990 models. 

1991 The objectives can be summarised below: 

1992 1) To build several validated statistical analysis methods for the estimation of 

1993 biliary excretion using biliary excretion data measured as the percentage of intact 

1994 compounds excreted through bile were collated from the literature. 

1995 2) To build several validated statistical analysis methods for the estimation of the 

1996 efflux transporter P-gp 

1997 3) To build several validated statistical analysis methods for the estimation of the 

1998 the uptake transporters OATP1B1, OATP1B3, and OATP2B1, which are known to 

1999 have significant roles in biliary excretion of compounds (Pfeifer et al, 2014; 

2000 Kusuhara and Sugiyama, 2002). 

2001 4) Using the most accurate QSARs for each transporter, to predict the binding 

2002 activity of compounds in the biliary excretion dataset; and incorporation of 

2003 predicted transporter binding values in the QSAR models from stage 2 and 3 for the 
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2004 prediction of percentage excretion of compounds through bile. This workflow has 

2005 been summarised in Figure 2.1. 

2006 5) A further objective of this project was to investigate ligand-transporter docking 

2007 as a prediction tool for the estimation of binding of compounds to the transporters. 

2008 The score of docking experiment was used as a molecular descriptor for the 

2009 prediction of compounds binding to P-gp. 

2010 
2011 



2012 

2013 Figure 2.1. A diagram representing the phase II of this project 

2014 
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2015 


3. Methods 


2016 

2017 The major methods employed in this work consisted of various QSAR and 

2018 molecular docking techniques that were used for the estimation of biliary excretion 

2019 and binding of compounds to the transporters, P-gp, OATP1B1, OATP1B3 and 

2020 OATP2B1. 

2021 3.1. Datasets 

2022 The datasets for each investigation have been explained in the relevant chapters 

2023 (Chapters 4-6). Table 3.1 gives a summary of the datasets. 

2024 Table 3.1. Summary of the datasets used in developing models. 


Dataset 

N 

Data type 

Biliary Excretion 

217 

Percentage of intact dose excreted through bile in 
rats (log BE%) 

P-gp binding 

219 

Inhibition constant (log Kj) measured in vitro 

OATP binding 

225 

Percentage inhibition measured in vitro 


2025 

2026 3.2. Calculation of Molecular Descriptors 

2027 3.2.1. ACD Labs/LogD Suite 12.0. 

2028 Simplified Molecular Input Line Entry System (SMILES) notations for all 

2029 compounds were obtained by search in systematic names in ACD/dictionary (ACD 

2030 Labs/LogD suite version 12.0., Advanced Chemistry Development Inc., Ontario, 

2031 Canada). If compounds were not available in the ACD/dictionary, then ChemFinder 

2032 gateway version 3.0 (CambridgeSoft, USA) was utilized to obtain the molecular 

2033 structure. Moreover, SMILES codes were double-checked in the online database 

2034 ChemSpider (approved by the community of Royal Society of Chemistry - RSC) 

2035 (ChemSpider, 2001). The SMILES notation of each compound was generated either 

2036 by entering the systematic name of the compound in the ACD/Dictionary to acquire 

2037 their molecular structures and SMILES codes or by drawing the structure in the 

2038 software and then obtaining the SMILES for the drawn structure. 
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2039 

2040 

2041 

2042 

2043 

2044 

2045 

2046 

2047 

2048 

2049 

2050 

2051 

2052 

2053 

2054 

2055 

2056 

2057 

2058 

2059 

2060 
2061 

2062 

2063 


Compound names and SMILES codes from Excel were copied into a Notepad file, 
and saved in txt fonnat. Notepad file was imported into ACD history view and 
different physicochemical properties were calculated for all compounds. The 
properties included logarithm of the octanol/water partition coefficient (LogP), 
logarithm of apparent partition coefficient (LogD) at different pH values 2, 5.5, 6.5, 
7.4 and 10, dissociation constant (pKa) for acidic and basic compounds, molar 
volume, index of refraction, polarisability, polar surface area and others. 

Fraction of compounds ionised at pH 7.4 were calculated from dissociation 
constants (pKa). The fractions of compounds that is ionised at pH 7.4 as acid (FiA), 
as base (FiB), or (for zwitterionic compounds) as acid and base (FiAB), and the 
fraction unionised (Fu) were calculated from the lowest acidic and the highest basic 
pKa values and are presented in equations 3.1 to 3.4 respectively (Ghafourian et al., 
2006). 


FiA = 


1 

l+antilog(pKa-7.4 ) 


FiB 


1 

l+antilog(7.4-pKa ) 


(Eq. 3.1) 

(Eq. 3.2) 


FiAB = FiA x FiB (Eq. 3.3) 

fU = ( 1 -FiA) x ( 1 -FiB) (Eq. 3.4) 


In Equations 3.1 and 3.2, pKa is the most acidic and the most basic pKa, 
respectively, which were obtained from ACD Labs pKa database and, in case the 
experimental pKa was not available, it was calculated by the software. 

The ACD/LogD calculations were perfonned for all compounds and the results 
were transferred to Microsoft Excel worksheet. 
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2064 3.2.2. TSAR 3D 

2065 Using TSAR 3D software (Version 3.3., Accelrys Ltd.) additional molecular 

2066 descriptors were calculated. The SD file created by ACD software was imported 

2067 into TSAR 3D. In this software, each row stored information about one compound 

2068 and each column stored a molecular descriptor. Initially, the partial atomic charges 

2069 were calculated for the molecules and COSMIC optimize 3D was applied to 

2070 minimize the molecular potential energies. This was essential since the generation 

2071 of 3D descriptors needs to be based on an accurate 3D molecular structure and 

2072 geometry. However, due to errors in some of the imported structures, COSMIC 

2073 energy minimisation did not automatically work for some of the compounds. Hence 

2074 the 3D structures of these compounds were modified manually by using the 3D 

2075 visualise tab in TSAR 3D to correct the errors and then run the COSMIC 

2076 minimisation. In most cases the structural errors were due to the valence state of 

2077 atoms which varied between ACD generated SD files and those in TSAR 3D. For 

2078 some of the compounds the SD molecular file fonnat could not be used and the 

2079 SMILES codes were imported to TSAR instead. The SMILES codes and the 

2080 compound names were copied and pasted in MS-Word as ‘text’. Using the “Find” 

2081 icon, the document was edited by finding “Tab” and replacing with “space”. The 

2082 edited document was then copied into WordPad and saved as text with .smi file 

2083 extension. The codes were then imported into TSAR 3D and eventually cosmic 

2084 minimisation was successfully executed. In few cases, calculations by TSAR 3D 

2085 were not possible. For example, the presence of heavy metal Platinum (Pt) in the 

2086 structure of a compound would lead to such an error. 

2087 A series of descriptors consisting of electronic, steric and hydrophobic parameters 

2088 as well as topological indexes were calculated using TSAR 3D for each compound. 

2089 The quantum mechanical properties were calculated using VAMP electrostatic 

2090 routine in TSAR 3D. The method used in VAMP was the semi-empirical approach, 

2091 AMI Hamiltonian. The calculated quantum mechanical properties include 

2092 electronic energy, total energy, accessible surface area, mean polarisability, dipole 

2093 moment, energy of the highest occupied molecular orbital (HOMO), and energy of 

2094 the lowest unoccupied molecular orbital (LUMO). VAMP calculations were not 

2095 possible for compounds with more than 50 heavy atoms in their molecular 
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2096 structures. The minimized molecular structures were saved as a SD file and the 

2097 molecular descriptors were exported to Excel. 

2098 

2099 3.2.3. Molecular Operating Environment (MOE) 

2100 The saved SMILES codes and names from ACD Labs/LogD were imported into 

2101 MOE software (Version 2012.10, Chemical Computing Group Inc. Montreal, 

2102 Canada). Using the wash tab, any unwanted fragments including salts and water 

2103 molecules were removed from the molecular structures. This process also 

2104 neutralized the protonated state of any charged structure. 

2105 Following the wash procedure, energy minimization was carried in order to 

2106 calculate atomic coordinates corresponding to the local minima. Within the energy 

2107 minimization function, the “preserve existing chirality” was also selected. 

2108 Thereafter, self-consistent field (SCF) calculations were perfonned. The SCF 

2109 energy minimization technique constructs an initial guess density matrix, in tenns 

2110 of the atomic orbitals and then iteratively refines them by correcting the kinetic 

2111 energy, nuclear energy and electron - electron repulsion. This allows the density 

2112 matrix to be self-consistent. The parameters calculated by SCF for the minimized 

2113 structures were SCF energy, HOMO-LUMO energy gap, heat of fonnation and 

2114 dipole moment. 

2115 Finally, after SCF energy minimization, all molecular descriptors were calculated 

2116 for each of the compounds and all data were saved as SD format and exported to 

2117 Excel. 

2118 

2119 3.2.4. Symyx QSAR version 2.2 

2120 Symyx QSAR software (previously known as MDL-QSAR) was used to obtain 

2121 additional molecular descriptors for the compounds in the datasets. Symyx QSAR 

2122 can calculate some additional molecular descriptors such as atom type 

2123 electrotopological indexes. The SD file from MOE was imported and 

2124 electrotopological state indexes for different atom types along with other 
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2125 topological indexes were calculated. The molecular descriptors were then exported 

2126 into an Excel file. 

2127 

2128 3.3. Development and Validation of QSAR Models 

2129 In this work, various data analytical techniques were used for the development of 

2130 QSAR models. Datasets of compounds were first divided into training and external 

2131 validation sets. In order for the training and test sets to have a similar range of 

2132 biological activities, compounds in each dataset were ordered according to the 

2133 relevant response variable and, depending on the size of the dataset, from each 

2134 group of five or four compounds one was allocated into the external validation set. 

2135 Models were developed using the training set compounds. These models were used 

2136 for the estimation of the response variable for the external validation set. The 

2137 details of these processes for individual datasets have been explained in the relevant 

2138 chapters. 

2139 Goodness-of-fit in prediction (regression based) models 

2140 Discrepancy between observed and predicted values shows the error, and is used to 

2141 assess the accuracy of QSAR models. The mean absolute error (MAE), root mean 

2142 squared error (RMSE) and mean fold error (MFE) were utilised to assess the 

2143 accuracy of predictions by QSAR models. 

2144 MAE = S '.observed-predicted, (E q. 3.5) 

2145 

2146 RMSE = (Eq . 3.6) 

2147 

2148 Mean Fold Error = antilog ( Z Unobserved-log-predicted^ (£q 3 J) 

2149 
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2150 Calculation of error in classification models 

2151 There are extensive numbers of perfonnance measures used to validate the 

2152 predictive power of classification models. The perfonnance of each algorithm was 

2153 measured using three performance measures, sensitivity (SE), specificity (SP) and 

2154 overall accuracy. 

2155 Sensitivity is proportion of compounds correctly predicted to be positive relative to 

2156 all the compounds experimentally detennined to be positive: 

2157 Sensitivity = TP/ (TP + FN) Eq. 3.8 

2158 Where TP is number of true positives, TN is number of true negatives, FP is 

2159 number of false positives, and FN is number of false negatives. 

2160 Specificity is proportion of compounds correctly predicted to be negative relative to 

2161 all the compounds experimentally detennined to be negative: 

2 1 62 Specificity = TN/ (TN + FP) Eq. 3.9 

2163 Overall accuracy in this study is defined as: 

2164 Overall Accuracy = SP x SE Eq. 3.10 

2165 

2166 3.3.1. Stepwise Regression Analysis 

2167 Minitab Statistical Software Version 16 was used for the development of multiple 

2168 linear regression (MLR) models. In stepwise regression analysis, independent 

2169 variables were normally all the molecular descriptors and the dependent (response) 

2170 variable was the activity under investigation. For example logarithm of the 

2171 percentage dose excreted via bile (log BE%) was the dependent variable in Chapter 

2172 4. In all regression analyses, a P value of less than 0.05 was considered to be 

2173 statistically significant for the variables. Values for “alpha to enter” and “alpha to 

2174 remove” items in stepwise regression method were set to 0.05. 

2175 
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2176 3.3.2. Classification and Regression Trees (C&RT) 

2177 Introduced by Breiman in 1984, C&RT are decision tree algorithms that produce 

2178 classification or regression trees depending on whether the dependent variable is 

2179 categorical or numerical. The analysis uses the Gini coefficient as an identifier of 

2180 suitable splitting criteria (Breiman et al, 1984). Based on recursive partitioning, 

2181 C&RTs are constructed by successively splitting a dataset into increasingly 

2182 homogeneous subsets until it is infeasible to continue, based on a set of stopping 

2183 rules (StatSoft, 2009). The analysis has an embedded feature selection method 

2184 which picks the most significant molecular descriptors for splitting the data into the 

2185 two most homogeneous groups (called branches or nodes). The process works by 

2186 monitoring the error on the test data during growth and choosing the one with 

2187 minimal error (Breiman et al, 1984). This algorithm starts off with the complete 

2188 training set, evaluates all available attributes (e.g. molecular descriptors), choosing 

2189 the one which best separates it. It then recursively proceeds to split the resulting 

2190 subsets until no improvement can be made by continuing to split; this happens 

2191 when the tree reaches a certain complexity based on the pre-set stopping criteria or 

2192 until all the data in the nodes have the same value. 

2193 STATISTICA software has Classification and Regression Trees (C&RT) routine, 

2194 which can develop classification tree (CT) or regression tree (RT) by selecting the 

2195 most significant molecular descriptors out of the descriptor pool at every step of 

2196 partitioning. C&RTs can also be built interactively, using the manually selected 

2197 descriptors. 

2198 Stopping rules are the criteria used to find the right-sized tree. The size of a tree in 

2199 C&RT analysis is an important issue, since an unreasonably big tree can lead to 

2200 overfitting and make the interpretation of results more difficult. Stopping 

2201 parameters could be a combination of the minimum number of cases, the maximum 

2202 number of levels, the maximum number of nodes, and minimum fraction of objects 

2203 for splitting. The parameters have mainly to do with which nodes should be split 

2204 and which should be terminal nodes. STATISTICA offers two choices for stopping 

2205 nodes: 1. Prune variance, and, 2. FACT direct stopping. When using deviance, the 

2206 minimum number of cases and maximum number of nodes are used for stopping. 

2207 For example with minimum number of cases equal 100, a node with less than 100 
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2208 cases will be a terminal node and no further split will be made. The maximum 

2209 number of nodes controls the overall tree complexity. The default stopping 

2210 parameters in STATISTIC A software depend on the number of data points (number 

2211 of compounds). For the FACT style stopping method, fraction parameters, rather 

2212 than number of compounds, will detennine if a node should be split. 

2213 The advantage of C&RTs is their simplicity at interpretation of results summarized 

2214 in a tree. The final results of using tree methods for classification or regression can 

2215 be summarized in a series of logical if-then conditions (tree nodes). Therefore, there 

2216 is no implicit assumption that the underlying relationships between the predictor 

2217 variables and the dependent variable are linear. 

2218 

2219 3.3.3. Interactive Tree (I-tree) Using C&RT 

2220 Interactive tree is a C&RT-style tree, which allows for the molecular descriptors to 

2221 be selected manually by the operator. This tool is useful when investigating the 

2222 effect of certain variables/ molecular descriptors on the property under 

2223 investigation. In I-tree, apart from the usual V-fold cross-validation procedure, 

2224 another cross-validation option, “Cross-validate tree sequence” was also applied. 

2225 This validation method is applied to the entire tree sequence, instead of just the 

2226 final tree in V-fold cross-validation (Hill and Lewicki, 2006). 

2227 

2228 3.3.4. Chi-square Automatic Interaction Detector (CHAID) 

2229 The Chi-square Automatic Interaction Detector (CHAID) is one of the oldest 

2230 decision tree methods initially suggested by Kass in 1980 (Kass, 1980). This tool 

2231 performs multi-level splits where C&RT uses binary splits. CHAID is well suited 

2232 for large datasets. Cross validation either V-fold or train and test samples can be 

2233 used to safeguard against overfitting the CHAID tree. The Stopping criteria 

2234 includes minimum number of cases for splitting, maximum number of nodes, 

2235 probability for splitting and probability for merging. To test the statistical 

2236 significance of splits, CHAID computes a Bonferroni adjusted P-value for the 
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2237 respective descriptor (Hill and Lewicki, 2006). Bonferroni adjustment is an option 

2238 in CHAID, used to control the type one error rate (familywise error rate) when 

2239 testing multiple hypotheses. It usually is accomplished by dividing the alpha level 

2240 by the number of tests being perfonned (usually 0.05 / n). In this work, we 

2241 employed Bonferroni adjustment as our preliminary results showed lower cross 

2242 validation error when this adjustment was used. 

2243 

2244 3.3.5. Boosted Trees (BT) 

2245 Boosted trees analysis generates a series of very simple boosting regression trees 

2246 (BT) where each successive tree is built for the prediction of residuals of the 

2247 preceding tree. Each of these trees has a weak predictive accuracy, but using the 

2248 weak predictors together can create a strong predictor (Lewicki and Hill, 2006). 

2249 The user defined parameters in this analysis includes the learning rate, the number 

2250 of additive tenns (number of trees), random test data proportion (fraction of data 

2251 points in testing pool) and subsample proportion. The seed for random number 

2252 generation that controls which cases are selected in sampling was set to one. The 

2253 maximum number of nodes was set to three, which means that each tree will have 

2254 just one binary split. 

2255 

2256 3.3.6. Random Forest Trees Model (RF) 

2257 A random forest (RF) model is an ensemble of tree predictors such that each tree 

2258 depends on the values of a random vector (a random selection of molecular 

2259 descriptors and training set compounds) sampled independently. The method builds 

2260 a series of simple trees where the predictions are taken to be the average of the 

2261 predictions of all the trees (Breiman, 2001). The analysis removes a user defined 

2262 portion of the data and keeps it as the internal test data. The remaining training set 

2263 data is sampled consecutively and models are developed for each subsample. 

2264 Various subsample proportions along with different numbers of trees may be 

2265 selected. The number of predictors (to be randomly considered at each node) was 

2266 set to nine throughout the thesis (to limit chance correlation and overfitting). The 
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2267 default settings were used for stopping conditions including minimum number of 

2268 cases, maximum number of levels, minimum number in child node and the 

2269 maximum number of nodes which is different depending on the size of the dataset. 

2270 The best model was selected based on the estimation error for the internal test data. 

2271 

2272 3.3.7. Multivariate Adaptive Regression Splines (MARS) Model 

2273 MARS is a non-parametric regression procedure that constructs a relation between 

2274 the dependent and independent variables from a set of coefficients and basic 

2275 functions that are entirely driven from the regression data (Friedman, 1991). It is a 

2276 very flexible technique that automatically models non-linearities and interactions 

2277 between variables. The non-linearities (knots) are represented by the so called 

2278 ‘hinge functions’ which are expressions of the type ‘max (a, b)’ where the value of 

2279 this expression will be a if a>b, or else b. Interactions between each variable pairs 

2280 can also be expressed in the formula. MARS model is developed by stepwise 

2281 addition of basis function in pairs (forward pass) to reduce the sum-of-squares 

2282 residual error, and then step-by-step removal of the least significant tenns to 

2283 achieve better generalisation (backward pass). The model created by this tool is 

2284 easy to understand, compared to some other data mining models such as boosted 

2285 trees. This tool sometimes is used as a method for finding the important predictor 

2286 variables as important infonnation for another analysis. The MAR Splines 

2287 algorithm picks up only those basis functions (and those predictor variables) that 

2288 makes a "sizeable" contribution to the prediction. Basis functions use a non- 

2289 parameter (break point) to find non-linear relationships. Increasing the maximum 

2290 number of basis functions gives the potential for more complex model. Using the 

2291 degree of interaction we can specify no interaction up to a very high order 

2292 interaction tenn. Model subsets are compared using the GCV criterion (Generalized 

2293 Cross-Validation). GCV is the adjusted form of residual sum-of-squares that 

2294 penalises the addition of knots in order to limit the model flexibility and overfitting. 

2295 In this investigation, in addition to using all the molecular descriptors in MARS 

2296 analysis and allowing MARS to select the significant descriptors, we performed a 

2297 pre-processing feature selection to select a limited number of molecular descriptors 
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2298 

2299 


for use in MARS analysis. The feature selection methods were different for 
different datasets and have been explained in the relevant chapters. 


2300 

2301 
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2302 4. QSAR Models for Biliary Excretion 

2303 

2304 4.1. Introduction 

2305 Biliary excretion is an important route for the elimination of some drugs and their 

2306 metabolites (Rosenbaum, 2011). Although the liver is generally identified with its 

2307 role in metabolism, one of the most important functions of the liver is fonnation of 

2308 bile which is then stored in the gallbladder and discharged into the duodenum upon 

2309 ingestion of food, with bile carrying also cholephilic xenobiotics. Bile which is a 

2310 composition of bile acids and other components such as phospholipids, bilirubin 

2311 and cholesterol is formed in the hepatocytes and is actively discharged across the 

2312 canalicular membrane into canaliculus (Rollins and Klaassen, 1979). Once bile is 

2313 released into the intestine, some metabolites and unchanged drugs continue their 

2314 way of elimination through the faeces. Others, for example lipid-soluble drugs, are 

2315 reabsorbed from the intestine and move to the systemic circulation (Rollins and 

2316 Klaassen, 1979). This enterohepatic circulation affects phannacokinetics by 

2317 keeping the plasma concentration of drugs high (Rosenbaum, 2011). Enterohepatic 

2318 cycling and biliary elimination can continue until the compound is ultimately 

2319 eliminated from the body by faecal or renal excretion or metabolism. Uptake from 

2320 sinusoidal blood and then secretion of bile salts across the canalicular hepatocyte 

2321 membrane are the major factors controlling the rate of bile secretion. 

2322 Basolateral bile salt uptake is driven through the Na + -dependent and Na + - 

2323 independent uptake systems (Kullak-Ublick et al., 2000). The main sodium- 

2324 dependent bile salt transporters are Na + -taurocholate co-transporting polypeptides 

2325 (human and rat). On the other hand, the Na + -independent uptake of bile salts cannot 

2326 be attributed to the function of a single transport system and several carrier systems 

2327 have been implicated including sulphate/anion exchanger, dicarboxylate/anion 

2328 exchanger and OET/cholate exchanger. In rats, the organic anion transporting 

2329 polypeptides (Oatpl, Oatp2 and Oatp4) have been indicated as the main sodium- 

2330 independent uptake proteins (Kullak-Ublick et al., 2000). The organic cation and 

2331 organic anion transporters (OCT and OAT, respectively) also play important roles 

2332 in the initial sinusoidal influx of drugs into hepatocytes (van Montfroot et al.. 
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2333 2003). These transporters have wide substrate specificities for a range of exogenous 

2334 and endogenous substrates (Leabman et al., 2003). OCT1 can be found abundantly 

2335 in hepatocytes and may be seen as the most important transporter for distribution of 

2336 cationic compounds into the liver from sinusoidal membrane (Nies et al., 2009). 

2337 Canalicular bile secretion is an osmotic process in which active excretion of organic 

2338 solutes into the bile canaliculus is the main driving force for the passive inflow of 

2339 water, electrolytes and nonelectrolytes from hepatocytes (Trauner and Boyer, 

2340 2003). While products of the multidrug resistance gene family (Mdr), namely bile 

2341 salt export pumps, Bsep (rat) and BSEP (human), transport monovalent bile salts 

2342 (Rollins and Klaassen, 1979), excretion of non-bile salt organic anions and divalent 

2343 sulphate or glucuronide bile salts is carried mainly by the multidrug resistance 

2344 protein 2 (MRP2). Bile salt export pump has a limited role in drug excretion. 

2345 However, drug inhibition of this pump can lead to hepatotoxicity (Morgan et ah, 

2346 2010). Another member of this family, P-glycoprotein, also has known as multidrug 

2347 resistance protein 1, actively effluxes xenobiotics into the bile (Schinkek et al., 

2348 1997). Breast cancer resistance protein (BCRP/ABCG2) is also involved in the 

2349 transport of a range of drugs. For example, nitrofurantoin has a very high biliary 

2350 excretion predominantly mediated by BCRP (Merino et al, 2005b). Other 

2351 basolateral isoforms of the multidrug resistance-associated protein, MRP4 and 

2352 MRP3, provide alternative routes for the elimination of organic anions from 

2353 hepatocytes into the systemic circulation (Kullak-Ublick et al, 2000). Properties of 

2354 the chemical structure as well as the characteristics of the liver such as specific 

2355 active transport sites within the liver cell membranes are the main factors which 

2356 determine the elimination of xenobiotics via the biliary tract (Rollins and Klaassen, 

2357 1979). Despite the various transport systems involved in the biliary elimination of 

2358 xenobiotics, there has been a number of attempts to identify common molecular 

2359 features of highly excreted compounds. Molecular weight (MW) has been 

2360 suggested as an important factor in biliary excretion levels of compounds. Anionic 

2361 compounds with the MW higher than 325±50 kDa in rats, 400±50 kDa in guinea 

2362 pigs, 475±50 kDa in rabbits and 500±50 kDa in human have been suggested as 

2363 good candidates for biliary excretion (Hirom et al., 1972). Most compounds with 

2364 lower molecular weights are quickly cleared through the kidneys and are not 

2365 excreted in the bile (Abou-El-Makarem et al., 1967). Bile is rich in endogenous 
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2366 organic anionic substrates (e.g., steroid hormones), organic cations (such as 

2367 quaternary ammonium), bilirubin and bile acids (Rollins and Klaassen, 1979). 

2368 Moreover, excretion route of anionic xenobiotics and some antibacterials is through 

2369 the bile (Crosignani, 1996). Principally, for organic cationic compounds, biliary 

2370 elimination depends on the molecular volume (Neef et al., 1984), lipophilicity of 

2371 the compound and the number of cationic groups (Feitsma, 1989). 

2372 Biliary excretion has major significance in determining the pharmacokinetic 

2373 profiles of drugs. In several disease states, the excretion of drugs through bile is 

2374 affected and toxicities may arise (Rosenbaum, 2011; Rollins and Klaassen, 1979). 

2375 Knowledge of biliary excretion levels of compounds can help in identifying any 

2376 possible mechanisms of hepatobiliary toxicity and potential drug-drug interactions. 

2377 Therefore, an insight into the structural profile of cholephilic compounds through 

2378 accurate modelling of the biliary excretion is important for predicting clinical 

2379 phannacokinetics. This is of a particular value during earlier stages of drug 

2380 discovery where low-cost estimation procedures are required. Quantitative 

2381 structure-activity relationships (QSARs) employ data mining techniques to explore 

2382 the relationships between biological properties of interest, e.g. pharmacokinetic 

2383 parameters of drugs, and the properties of the molecular structures (Ghafourian et 

2384 al., 2006). Recently, a QSAR model developed using 2D molecular descriptors 

2385 showed good prediction ability for a set of literature biliary excretion data measured 

2386 under the same experimental model (Luo et al., 2010). However, re-evaluation of 

2387 this simple model showed that the statistical significance of the model is lost when 

2388 it is used for the prediction of a wide set of external compounds (Gandhi and 

2389 Morris, 2012), suggesting that hepatobiliary excretion cannot be captured by simple 

2390 physicochemical descriptors when examining chemically dissimilar compounds. 

2391 Unfortunately, availability of in vivo biliary excretion data which is necessary for 

2392 modelling is very limited. Yang et al. (Yang et al., 2009) have recently compiled a 

2393 big dataset of percentage of dose eliminated in the bile in rats and humans. This 

2394 offers an excellent resource for a detailed study on the structural determinants for 

2395 high biliary excretion. Using this dataset, Yang and co-workers suggested a MW 

2396 threshold of 400 Da for anions in rats and 475 Da for anions in humans. They also 

2397 developed linear regression models for human and rat. The aim of this study was to 

2398 use an expanded dataset and incorporate non-linear methods to develop statistically 
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2399 valid QSAR models. Specifically, classification and regression tree (C&RT) is a 

2400 flexible and yet simple and interpretable technique with embedded feature selection 

2401 that selects the most significant molecular descriptor for partitioning the data into 

2402 smaller subsets of similar observations (Breiman et ah, 1984). This rule-based 

2403 technique is a decision tree that splits the data in a recursive manner until the subset 

2404 has all the same value of the target (dependent) variable, or when no gain in the 

2405 prediction accuracy is achievable by further splitting. In this study, we aimed at 

2406 using regression trees and two ensemble methods that construct many such decision 

2407 trees and return the consensus prediction by the trees, namely random forest and 

2408 boosted trees. The prediction accuracy of the models and the molecular descriptors 

2409 selected by these methods were compared in order to clarify the structural elements 

2410 controlling the biliary excretion. Moreover, regression trees were used to examine 

2411 the significance of molecular weight and presence of carboxylic acid groups and to 

2412 find the statistically significant threshold values. In this case, regression trees are 

2413 useful since they can be used interactively so that a molecular descriptor of choice 

2414 can be incorporated at any split level and the analysis may determine the 

2415 statistically significant threshold value of the descriptor for splitting the data. 

2416 

2417 4.2. Methods 

2418 In this investigation RT models were made with log BE% as the dependent variable 

2419 and predictors were selected by this statistical analysis from all the molecular 

2420 descriptors used in the analysis, “observed” refers to the log percentage of intact 

2421 dose excreted into the bile from in vivo studies. In all statistical analyses, logarithm 

2422 of percentage dose excreted (log BE%) was used in the analysis instead of 

2423 percentage of dose excretion. This was due to the nonnal distribution of log BE% 

2424 as indicated by the skewness comparison with BE. 

2425 

2426 4.2.1. The Dataset 

2427 The biliary excretion dataset was that collated by Yang et al (2010) available at 

2428 http://www.buffalo.edu/~memorris, with the addition of some new data from 
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2429 literature (Hirom et al., 1972., Abou-el-makarem et al., 1967., Hughes et al., 1973., 

2430 Fahrig et al., 1989., Funakoshi et al., 2005., Luo et al., 2010., Scott et al., 1994., 

2431 Matsushita et al., 1992., Prueksaritanont et al., 2003., Niinuma et al., 1999., 

2432 Vaidyanathan and Boroujerdi, 2000., Fukuda et al., 2008., Chu et al., 1997., Wu et 

2433 al., 2008., Wright and Line, 1980., Chan et al., 2002., O’Reilly et al., 1971., 

2434 Watkins Dykstra, 1987., Sasabe et al., 1999., Weinz et al., 2009., Mohri et al., 

2435 2005., Kemmerer et al., 1979., Itagaki et al., 2003., Evanchik et al., 2009., Krishna 

2436 et al., 1999., Broggini et al., 1980., Israel et al., 1978., Arimori et al., 2003., Itoh et 

2437 al., 2004). It consists of in vivo biliary excretion expressed as percentage of dose 

2438 excreted as the parent compound intact through the bile (BE%) for 217 compounds 

2439 in rat after iv or intraperitoneal administration of the compound. The compounds 

2440 are from different chemical classes such as bile acids, statins, dyes, penicillins and 

2441 cephalosporins, macrolide antibiotics, quinolone antibiotics, NSAIDs, thrombin 

2442 inhibitors, analgesics, anti-cancer drugs such as doxorubicin, folate, peptides, anti- 

2443 HIV agents, quaternary ammoniums, sulphanilamide and arylaminosulphonic acids. 

2444 Biliary excretion in the database is presented by percentage of drugs excreted 

2445 through bile, or bile clearance. 

2446 Where several values were available for the same compound the mean values were 

2447 used. Table 4.1 shows an example of this. 

2448 Table 4.1. Example of different values reported for percentage dose excreted in bile 

2449 for methotrexate 


Compound 

% Dose excreted in bile 
as parent compound 

Collection period 

Methotrexate 

72 

480 min (8 hr) 

Methotrexate 

84.3 

600 min (10 hr) 

Methotrexate 

58.9 

720 min (12 hr) 

Methotrexate 

64 

1440 min (24 hr) 

Average 

69.8 


2450 

2451 This dataset is presented in Appendix I, including all the references. 

2452 
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2453 4.2.2. Model Development and Validation 

2454 In this study, QSARs were established to relate the biliary excretion of compounds 

2455 (log BE%) to the molecular descriptors. Molecular descriptors were calculated 

2456 according to the procedures explained in section 3.1. Before building the models, 

2457 the molecular descriptors were checked to find and discard those columns 

2458 containing more than 98% constant values or more than 28 (out of 217) missing 

2459 values. The total number of molecular descriptors used in all statistical analyses 

2460 was 387. 

2461 The compounds were divided into an external validation set and a training data. To 

2462 divide the compounds, they were ordered according to BE%, and from every set of 

2463 five compounds, four were allocated into the training and one into the external 

2464 validation set randomly. In this way, training data consisted of 168 compounds and 

2465 the external validation set consisted of 40 compounds. Out of 217 compounds in the 

2466 rat biliary excretion dataset, 9 compounds had excretion rate of 0%, and hence log 

2467 BE% could not be calculated for them. For the analytical methods that required 

2468 parameter optimization, a fraction of training set compounds were randomly 

2469 assigned into internal validation set, or alternatively, cross validation was used if 

2470 the option was available in the statistical software. STATISTICA Data Miner was 

2471 the software used for statistical analysis. The general idea of V-fold cross- 

2472 validation is to divide the overall sample into a number of V-folds. The V-fold 

2473 cross-validation technique is used in various analytical procedures to avoid 

2474 overfitting of the data (Burden, 1989). For the internal validation set, where 

2475 applicable, the risk estimate and standard error were calculated in STATISTICA 

2476 software and used as the performance indicators. Risk estimate is calculated as the 

2477 proportion of residual variance incorrectly estimated by the model. Standard error 

2478 measures the error of the prediction. 

2479 Several linear and non-linear methods were used for the QSAR model 

2480 development. These included stepwise regression analysis, stepwise regression 

2481 analysis, Classification and Regression Trees (C&RT), Boosted trees (BT) and 

2482 Random Forest (RF). 
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2483 The methods have been explained in section 3.2. In C&RT analysis, several 

2484 stopping criteria were examined, including the default settings in STATISTICA. 

2485 The default stopping criteria were minimum number of cases of 21 and the 

2486 maximum number of nodes set to 100. The default V-value of 10 was used in the 

2487 V-fold cross-validation and the risk-estimate was used to check the reliability of the 

2488 resulting RTs. In BT analysis, the default values for learning rate, the number of 

2489 additive tenns, random test data proportion and subsample proportion were 0.1, 

2490 200, 0.2 and 0.5 respectively. Various subsample proportions of 0.4, 0.45, 0.50, 

2491 0.55 and 0.60 were examined in combination with the learning rates of 0.1 and 

2492 0.05. The best two models were selected based on the performance indicators for 

2493 the internal validation set. In RF analysis, various subsample proportions of 0.40, 

2494 0.45, 0.50, 0.55 and 0.60 were examined. Different numbers of trees were tested at 

2495 20, 50, 80, 100 and 200. The random test data proportion was 0.2 for the internal 

2496 validation. The default settings were used for stopping conditions including 

2497 minimum number of cases, maximum number of levels, minimum number in child 

2498 node and the maximum number of nodes of 6, 10, 5 and 100, respectively. The best 

2499 model was selected based on the estimation error for the internal test data. 

2500 

2501 4.3. Results of QSAR Models for Biliary Excretion 

2502 A total of 387 2D (e.g. kappa shape indexes, molecular connectivity indexes and 

2503 electrotopological state indexes) and 3D molecular descriptors were used for the 

2504 QSAR model development. The method of data allocation into training and test sets 

2505 outlined above ensured that a similar biliary excretion and molecular property 

2506 spaces were covered by both the training and the validation sets. BE% values 

2507 ranged between 0.048 and 100 with mean log BE% values for the training and 

2508 validation sets at 1.04 and 1.01, respectively. LogP was between -3.44 and 18.8 for 

2509 the training set, and -3.17 and 7.83 for the validation set with similar mean values 

2510 of 1.81 and 1.83, respectively. Molecular weights of the compounds were between 

2511 122 and 1215 Da for the training set and 94 and 1255 Da for the validation set, with 

2512 mean values of 457 and 390, respectively. Scores plot from principle component 

2513 analysis using all the molecular descriptors also indicates similar chemistry space 

2514 for the two sets (Figure 4.1.). 
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Figure 4.1. Scores plot of PCA using all 387 molecular descriptors 


4.3.1. Regression Models 

Linear regression equations are the simplest and most straightforward QSAR 
techniques. This has the benefit of easy interpretation which can provide some 
mechanistic insight into the process under investigation (Patel et al., 2002). 
Stepwise regression analysis using in vivo rat biliary excretion data as the 
dependent variable resulted in the MLR (1) model below in which the number of 
molecular descriptors is limited to eight. The statistical terms of the equation are N 
the number of compounds, R-Sq the correlation coefficient, S the standard 
deviation and F Fisher’s statistics and the P value. Observed versus calculated log 
BE% by this equation has been plotted (Figure 4.2.), with training and validation 
sets identified in the plot. 
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2531 MLR (1) model 

2532 Log BE% = 2.09 + 0.00129 Vsurf_HB4 - 9.33 PEOE_RPC+ - 0.0574 SsCH3 - 

2533 0.377 fU - 0.00503 SlogP_VSA0 - 0.0573 SsssCH + 0.0403 AMl_dipole + 0.378 

2534 SddssS_acnt 

2535 N = 168 S = 0.489 R-Sq = 0.608 F = 30.9 P = 0.000 

2536 Molecular descriptors of this equation are not intercorrelated (R ~ < 0.4). 

2537 R-sq, S, F and P for the validation set are 0.47, 0.478, 31.20 and 0.000 respectively. 

2538 Table 4.2 gives a brief description of molecular descriptors used in this model. 

2539 Vsurf_HB4 is the first molecular descriptor selected by the analysis and it indicates 

2540 that compounds with high H-bond donor capacity have higher biliary excretion 

2541 level. AMl dipole (dipole moment) is the other polarity descriptor which has a 

2542 positive effect. On the other hand, the equation shows that drugs with greater 

2543 relative positive partial charge (PEOE_RPC+) have lower biliary excretion. The 

2544 value of this descriptor is large for small acidic molecules such as benzoic acid and 

2545 salicylic acid, and therefore the small size of such compounds may be the reason 

2546 for the reduced biliary excretion. In this equation, fU with a negative coefficient 

2547 indicates that compounds with higher unionised fraction at pH 7.4 have lower 

2548 biliary excretion. In other words, although according to fU, acidity and basicity 

2549 (dissociation in general) increase the biliary excretion of compounds, this is true 

2550 only for large dissociated molecules. The positive effect of polarity and dissociation 

2551 on biliary excretion is in agreement with the literature, where for example polar 

2552 surface area (Gandhi and Morris, 2012) and an acidity indicator (Luo et al., 2010; 

2553 Chen et al., 2010) have been included in linear QSAR models. 
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Figure 4.2. Observed vs predicted log BE% using MLR (1) 


Also according to this equation, compounds containing many methyl groups 
(SsCH3) and those that are highly branched containing >CH- groups (SsssCH) 
have lower biliary excretion. Examples are macrolide antibiotics (i.e. telithromycin, 
azithromycin, erythromycin, actinomycin) muscle relaxant pipecuronium and the 
chemosensitizer PSC 833. The predominant excretion routes in these compounds 
are metabolism (Lee and Lee, 2007; Amacher et al., 1991; Lam et al., 2006; Lahiri 
et al., 1970; Vereczkey and Szporny, 1980; Song et al., 1998) except for 
pipecuronium and azithromycin for which the main excretion route is renal and 
biliary excretion respectively. 

In this equation, SlogP VSAO shows the negative impact of the presence of atoms 
with LogP(o/w) contribution of less than or equal to -0.4. SddssS acnt indicates the 
direct effect of sulphate or sulphonamide groups. Sulphate and sulphonamide 
groups are found in sulphonamide drugs such as succinylsulphathiazole, dyes such 
as methyl orange and sulphate conjugates such as estrone 3-sulphate which may be 
substrates of MRP2 or BCRP (Zamek-Gliszczynski et al., 2006). 
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2572 Table 4.2. A brief description of the most important molecular descriptors selected 

2573 and used by the models. 


Descriptor 

Model 

Description 

a ace 

RF(1) 

Number of H-bond acceptor atoms. 

ahyd 

BT (1) 

Number of hydrophobic atoms. 

AMl_dipole 

MLR (1), 
RT (1) 

Dipole moment calculated using AMI 
Hamiltonian. 

BCUTPEOEO 

RF (l) 

The BCUT descriptor calculated from the 
eigenvalues of a modified adjacency matrix. The 
resulting eigenvalues are sorted and the smallest, 
1/3-ile, 2/3-ile and largest eigenvalues are 
reported, in this case the 2/3-ile. The diagonal 
takes the value of the PEOE partial charges. 

CASA- 

I-tree (2) 

Negative charge weighted surface area, ASA- 
times max {qi< 0}. 

chil 

RF (l) 

First order molecular connectivity index (Hall et 
al., 2007). 

COOH 

I-tree (2) 

Indicator variable for the presence of carboxylic 
acid group in the molecular structure. 

Docking energy 
(MOE) 

RF (l) 

Docking score (kcal/mol) for enzyme-ligand 
docking of the compounds into the active site of 
P-glycoprotein (Aller et al., 2009) calculated 
using MOE software. 

FASAH 

RT (1) 

Fractional ASA H calculated (water accessible 
surface area of all hydrophobic atoms) as ASA H 
/ASA. 

FCASA- 

I-tree (1) 

Fractional CASA- calculated as CASA- / ASA. 

fU 

MLR (1), 
RT (1) 

Fractions of compounds unionised. 

GCUTSLOGPl 

RT (1) 

The GCUT descriptors are calculated from the 
eigenvalues of a modified graph distance 
adjacency matrix. Each ij entry of the adjacency 
matrix takes the value l/sqr(djj) where dj, is the 
(modified) graph distance between atoms i and j. 
The resulting eigenvalues are sorted and the 
smallest, 1/3-ile, 2/3-ile and largest eigenvalues 
are reported. The diagonal takes the value of the 
atomic contribution to logP. 

Kier2 

BT (1), BT 
(2) 

Second order kappa shape index: (n-1) / m (Hall 
et al., 2007) 

Kier3 

BT (1), BT 
(2) 

Third order kappa shape index: (n-1) / m (Hall 
et al., 2007) 

KierAl 

I-tree (1) 

First order alpha modified shape index: s (s- 
l) 2 / m 2 where s = n + a (Hall et al., 2007) 

KierA3 

BT (1), BT 
(2) 

Third order alpha modified shape index: (n-1) (n- 
3) 2 / p3 2 for odd n, and (n-3) (n-2) 2 / p3 2 for even 
n where s = n + a (Hall et al., 2007). 
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Descriptor 

Model 

Description 

LogD (5.5) 

BT (2) 

Logarithm of distribution coefficient D of a 
compound between octanol and buffer layers at 
pH value 5.5. 

LogD (6.5) 

RT (1), I- 
tree (1), 
BT (1), BT 
(2) 

Logarithm of distribution coefficient D of a 
compound between octanol and buffer layers at 
pH value 6.5. 

LogD (7.4) 

BT (1), BT 
(2) 

Logarithm of distribution coefficient D of a 
compound between octanol and buffer layers at 
pH value 7.4. 

LogD (10) 

I-tree (2), 
BT (2) 

Logarithm of distribution coefficient D of a 
compound between octanol and buffer layers at 
pH value 10. 

MW 

I-tree (1) 
RF (1) 

The molecular weight. 

N ratio 

RT (1) 

The weight ratio of nitrogen atoms in the 
molecule. 

PEOE PC- 

I-tree (2) 

Total negative partial charge. 

PEOERPC+ 

MLR(l), 
BT (2) 

Relative positive partial charge: the largest 
positive atomic charge divided by the sum of the 
positive partial charges. 

PEOE VS A NEG 

I-tree (1) 

Total negative van der Waals surface area. 

PEOEVSA-O 

RT (1) 

Van der Waals surface area of atoms with atomic 
charge in the range [-0.05, 0.00). 

PEOEVSAFPPOS 

RF(1) 

Fractional positive polar van der Waals surface 
area. This is the sum of the VDW surface area 
such that partial charge of atom is greater than 
0.2. 

PEOEVSAHYD 

BT (1), BT 
(2) 

Total hydrophobic van der Waals surface area. 
This is the sum of the van der Waals surface area 
such that absolute value of atomic charge is less 
than or equal to 0.2. 

QPC+ 

RF (l) 

Total positive partial charge: the sum of the 
positive partial charge of atoms in the molecule. 

SddssS_acnt 

MLR (1) 

Count of all sulphur atoms (ddssS) E-state values 
in molecule. 

SlogP_VSA0 

MLR (1) 

Sum of approximate accessible van der Waals 
surface area for atoms with atomic contribution to 
logP(o/w) of equal or less than -0.4. 

SMRVSA7 

I-tree (1) 

Sum of approximate accessible van der Waals 
surface area for atoms with atomic contribution to 
molar refractivity of/?, > 0.56. 

SsCH3 

MLR (1) 

Atom type electrotopological state index (sum of 
the E-states) for (-CH3) groups. 

SsssCH 

MLR (1) 

Sum of E-State for all (>CH- ) groups in 
molecule. 

SssssC 

I-tree (1) 

Sum of all (> C <) E-State value in molecule. 

TPSA 

RF (l) 

Topological polar surface area (A 2 ). 
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Descriptor 

Model 

Description 

VAdjEq 

RF (l) 

Vertex adjacency information (equality): This is 
an atom count /bond count descriptor calculated 
as: 

-(1 -f)log 2 ( 1 -f) - f logo f where f = (n 2 - m) / n 2 , n 
is the number of heavy atoms and m is the 
number of heavy-heavy bonds. If f is not in the 
open interval (0,1), then 0 is returned. 

vsa_hyd 

BT (1) 

Approximation to the sum of VDW surface areas 
of hydrophobic atoms (A 2 ). 

vsurf_CW4 

I-tree (1) 

Capacity factor is the ratio of the hydrophilic 
surface over the total molecular surface, 
calculated at eight different energy levels (from - 
0.2 to -6.0 kcal/mol). 

vsurf EDmin3 

RT (1) 

The lowest hydrophobic energy. 

vsurf HB4 
vsurf_HB5 
vsurf HB6 

MLR (1), 
BT (1), 

BT (1) 

H-bond donor capacity at -2.0 Kcal/mol with 
carbonyl oxygen probe. 

vsurf_ID7 

RT (1) 

Hydrophobic integy moment (The "integy 
moment" is defined in analogy to the dipole 
moment and describes the distance of the centre 
of mass to the barycenter of hydrophobic 
regions). Small integy moment indicates that the 
hydrophobic moieties are either close to the 
centre of mass or they balance at opposite ends of 
the molecule, so that their resulting barycentre is 
close to the centre of the molecule. VolSurf 
computes ID at eight different energy levels 
(from -0.2 to 1.6 Kcal/mol). 

vsurf_IW2 

I-tree (1), 
BT (2) 

Hydrophilic integy moment (see vsurf ID7). 

vsurf_W 1 
vsurf_W3 

RF(1), 

RT (1) 

Hydrophilic volume. 


2574 

2575 

2576 4.3.2. Regression Tree Models Using C&RT 

2577 Several RTs were generated using a combination of molecular descriptors while 

2578 cross-validation was applied. The best RTs were selected based on the standard 

2579 error for the internal test set. As seen in Table 4.3, in RT (1), molecular descriptors 

2580 were selected by C&RT analysis, while in I-tree (1), the molecular weight and in I- 

2581 tree (2), the number of carboxylic acid groups were manually imposed as the first 

2582 split descriptor using interactive C&RT routine in STATISTICA. These models 
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2583 were developed using the training set while the validation set remained external. 

2584 The RTs resulting from these trials have been presented in Figs. 4.3, 4.4 and 4.5. In 

2585 the regression trees, N is the number of compounds, Mu is the average and Var is 

2586 the variance of log BE% in each node. The molecular descriptors employed in the 

2587 trees have been explained in Table 4.2. 


2588 


Tree graph for log BE% 

Num. of non-terminal nodes: 11, Num. of terminal nodes: 12 


Model: C&RT 
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2589 Figure 4.3. RT (1) developed using the training set with the descriptors selected by 

2590 C&RT 

2591 


2592 Table 4.3 provides description of the regression trees. 

2593 Table 4.3. Description of the Regression Trees 


Model No 

Manually incorporated variables 

RT (1) 

None 

I-tree (1) 

Molecular weight 

I-tree (2) 

Carboxylic acid group 
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2595 According to RT (1), biliary secretion is much higher for compounds with large 

2596 hydrophilic volume (vsurf_W3), especially if they are ionised with fU < 0.001 

2597 (negligible unionised fractions at pH 7.4). Within the hydrophilic drugs of higher 

2598 fU values (node 7), those with higher separation of lipophilic interaction sites from 

2599 the centre of mass (vsurf_ID7 > 0.760) have higher biliary excretion. Surfactant 

2600 molecules and glucuronide conjugates are examples of such molecules with high 

2601 VolSurf integy moment (vsurf_ID7) and high biliary excretion. This branch follows 

2602 to partition the molecules further according to GCUT SLOGP l with compounds 

2603 of lower hydrophobicity (node 18), and large hydrophobic interaction energy 

2604 minima (vsurf_EDmin3 > -2.60) showing high biliary excretion (node 23). 

2605 According to RT (1), the less hydrophilic drugs with vsurf_W3 values below 

2606 417.56 can be excreted heavily through the bile if they are highly dipolar (AM1- 

2607 dipole > 4.336) with high ratio of lipophilic to total surface area (FASA H > 0.50), 

2608 especially if they are predominantly in the ionised form at pH 7.4 (fU < 0.052). On 

2609 the other hand, compounds with low dipole moment have low biliary excretion 

2610 specially if they are lipophilic with LogD(6.5) > 2.51 (node 9) or otherwise if they 

2611 contain a high ratio of nitrogen atoms in the molecular structure (node 15). N ratio 

2612 is low for larger alkaloids such as morphine or non-basic compounds, such as 

2613 estrone 3-sulphate, which will have moderate biliary excretion especially if they are 

2614 hydrophilic (PEOE_VSA-0 < 94.24). 

2615 I-tree (1) was a result of molecular weight being employed in the first split using 

2616 the interactive C&RT analysis in STATISTICA (Figure 4.4). The statistically 

2617 selected molecular weight threshold was 347.9 Da, with the compounds below this 

2618 weight showing lower log BE% values than the larger compounds. The tree shows 

2619 that large (MW > 347.9) hydrophilic compounds (vsurf_CW4 > 0.540) have higher 

2620 biliary excretion, particularly those with large total negative van der Waals surface 

2621 area (PEOE_VSA_NEG) and low surface area corresponding to highly polarisable 

2622 groups (SMR_VSA7), especially if they are highly branched (SssssC >-1.812). 

2623 Within this group of compounds, larger molecules with KierAl > 21.135 will have 

2624 even higher biliary excretion. Other parameters of I-tree (1) indicate that high 

2625 hydrophilic integy moment (vsurf_IW2) (node 13) and fractional negative charge 

2626 weighted surface area (FCASA-) (node 11) would result in high log BE% value. 
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Tree graph for log BE% 

Num. of non-terminal nodes: 9, Num. of terminal nodes: 10 
Model: C&RT 



Figure 4.4. I-tree (1) developed using interactive C&RT analysis using molecular 
weight as the first descriptor. 


Recent studies by Yang and co-workers show that presence of carboxylic acid 
group(s) may indicate a trend towards increased biliary excretion (Yang et al., 
2009). Therefore, the impact of presence of carboxylic acid group was examined 
using the interactive C&RT analysis with COOH used as the first partitioning 
molecular descriptor (Figure 4.5). According to I-tree (2), compounds containing at 
least one carboxylic acid group have higher biliary excretion levels. Furthermore, I- 
tree (2) indicated that compounds with lower total negative partial charge 
(PEOEPC-) have much higher biliary excretion (node 6). These are large 
hydrophilic compounds with many negatively charged atoms. Non-acidic 
compounds in node 2 will have high biliary excretion if the negative-charge 
weighted surface area for these molecules is high (node 5). CASA- has an element 
of size as well as indicating the presence of negatively charged groups such as 
sulphates. 
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Tree graph for log BE% 

Num. of non-terminal nodes: 4, Num. of terminal nodes: 5 
Model: C&RT 
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Figure 4.5. I-tree (2) using the number of carboxyl groups (COOH) as the first 
descriptor 
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2648 Table 4.4. Statistical parameters of the models for training and test sets; RT is 

2649 regression tree; BT is boosted trees and RF is random forest model 


Model 

Group 

Risk Estimate 

Standard Error 

RT (1) 

Train 

0.112 

0.040 


Validation 

0.583 

0.116 

I-tree (1) 

Train 

0.229 

0.034 


Validation 

0.348 

0.081 

I-tree (2) 

Train 

0.323 

0.050 


Validation 

0.349 

0.075 

BT (1) 

Train 

0.079 

0.007 


Validation 

0.328 

0.103 

BT (2) 

Train 

0.078 

0.007 


Validation 

0.329 

0.107 

RF (l) 

Train 

0.262 

0.047 


Validation 

0.311 

0.076 
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2651 

2652 4.3.3. Boosted Trees 

2653 Boosted tree module computes a sequence of simple trees, where each successive 

2654 tree is built for the prediction of the residuals of the preceding trees. The analysis 

2655 using various combination of model parameters resulted in two best models 

2656 selected based on the error level for the internal test set (Table 4.3). In models BT 

2657 (1) and BT (2), the optimal numbers of trees were 145 and 147, with the learning 

2658 rate of 0.10 and subsample proportions of 0.55 and 0.60, respectively. 

2659 It is possible to elucidate the influential descriptors in boosted trees analysis using 

2660 variable importance calculation. Variable importance in STATISTIC A is calculated 

2661 as the relative (scaled) average value of the predictor statistic over all trees and 

2662 nodes; hence these values reflect on the strength of the relationship between the 

2663 predictors and the dependent variable of interest, over the successive boosting steps 

2664 (STATISTICA help file, 2009). Included in Table 4.2 are the top ten most 

2665 important molecular descriptors of BT (1) and BT (2) models. Some of the 

2666 descriptors used by BT models are those already observed in RT and MLR models. 

2667 For example, LogD (6.5) is present in two RT models and it is amongst the top ten 

2668 most significant descriptors of both BT models. Other descriptors selected by these 

2669 models are topological/size descriptors (KierA3, Kier2 and Kier3) and other 

2670 lipophilicity descriptors such as LogD at different pH values and vsurf descriptors. 

2671 Table 4.4 shows the statistical significance of these models. Graphs of average 

2672 squared error against number of trees for training and cross-validated test sets can 

2673 be found in Figures 4.6 and 4.7 for BT (1) and BT (2). 

2674 
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Summary of Boosted Trees 
Response: LogBE% 

Optimal number of trees: 145; Maximum tree size: 3 


2675 



-Train data 

-Test data 

-Optimal number 


2676 Figure 4.6. Average squared error of log BE% against the number of trees in the 

2677 boosted trees model BT (1) for the training and internal test set 

2678 


2679 

2680 

2681 


Summary of Boosted Trees 
Response: LogBE% 


Optimal number of trees: 147; Maximum tree size: 3 



-Test data 

Number ofTrees -Optimal number 

Figure 4.7. Average squared error of log BE% against the number of trees in the 
boosted trees model BT (2) for the training and internal test set 
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2696 


4.3.4. Random Forest 


In RF, the number of trees specifies the number of simple regression trees to be 
computed in successive forest building steps. The model development used the 
default values of the software with the number of trees set at 100. The graph of 
average squared error against number of trees for training and cross-validated test 
sets indicates that the test and training set errors reach a plateau at around 10-15 
trees (see Figure 4.8). The best model was achieved with a subsample proportion of 
0.60, random test data proportion of 0.2 and number of trees of 100. 


Summary of Random Forest 
Response: NewVar 

Number of trees: 100; Maximum tree size: 100 



-Train data 

-Test data 


Figure 4.8. Average squared error of log BE% against the number of trees in the 
random forest model (RF) for the training and internal test set 

Table 4.2 includes a description of the ten most significant descriptors employed in 
this model. Table 4.4 gives a summary of the statistical parameters of the RF 
model. 
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2697 4.3.5. Validation of the Models 

2698 All models were validated by the same external validation set which had been set 

2699 aside and not used at any stage of model development. Table 4.5 shows the 

2700 prediction accuracy of the QSAR models using external validation in terms of the 

2701 mean absolute error and the number of outliers. In addition, an average estimate of 

2702 log BE% using all regression trees (RT (1) - I-tree (2)) was calculated and 

2703 compared with the observed values to investigate any possible improvements in 

2704 prediction accuracy. Table 4.5 gives the performance of this estimation method 

2705 (consensus RTs). 

2706 Table 4.5. Summary of the prediction accuracy of the QSAR models 


Model 

MAE for training set 

MAE for validation set 

Outliers 

MLR (1) 

0.377 

0.483 

11 

RT (1) 

0.304 

0.373 

6 

I-tree (1) 

0.345 

0.451 

10 

I-tree (2) 

0.424 

0.468 

12 

Consensus RTs 

0.319 

0.383 

7 

BT (1) 

0.229 

0.412 

8 

BT (2) 

0.226 

0.417 

7 

RF(l) 

0.403 

0.496 

14 


2707 

2708 

2709 

2710 

2711 

2712 

2713 
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2714 


2715 

2716 4.4. Discussion 

2717 Biliary excretion can play a significant role in the elimination of drugs, and, 

2718 therefore, its prediction is an important target in drug discovery. In the 

2719 pharmaceutical industry, drug candidates are routinely tested in animal studies to 

2720 measure the extent of biliary excretion and propensity of enterohepatic cycling, 

2721 which have significant roles in the pharmacokinetics of a drug. In drug discovery, a 

2722 reliable, user friendly and low-cost model based on computer-generated molecular 

2723 properties can reduce the number of high-cost animal (mainly rat) studies. This 

2724 investigation aimed to elucidate how secretion into bile of compounds is controlled 

2725 by their molecular structure, and to develop predictive models based on the 

2726 molecular structure. Linear regression analysis, regression trees and two ensemble 

2727 methods, boosted trees and random forest, were used for the QSAR model 

2728 development. 

2729 

2730 4.4.1. Comparison of the Models 

2731 Linear regression equation is one of the simplest and the most common QSAR 

2732 techniques. This method has the benefit of easy interpretation and it can provide 

2733 mechanistic insight into the process under investigation. However, it has been 

2734 argued that many biological processes have more complex relationships with the 

2735 molecular attributes of the compounds and hence linear regression models may fail 

2736 to capture these (Guha and Jurs, 2004). RT offers a suitable alternative to MLR 

2737 method with the advantage of being flexibly non-linear while retaining the 

2738 interpretability (De'ath and Fabricius, 2000). Ensemble methods such as random 

2739 forest (Breiman, 2001) provide consensus predictions which may have improved 

2740 accuracy. But this is often accompanied by a loss of interpretability, as the 

2741 ensemble of many models is often used as a ‘black box’ prediction tool. In this 

2742 investigation, STATISTICA variable importance analysis was used to find the most 

2743 significant molecular descriptors in the boosted trees and random forest models. 
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2744 According to Table 4.5, the most predictive model with the lowest estimation error 

2745 for the external validation set is RT (1) followed by BT (1) and BT (2) and then I- 

2746 tree (1). In other words, increasing the complexity of the models by allowing non- 

2747 linear relationships and an ensemble of such models has been able to improve the 

2748 prediction accuracy in comparison with a simple linear regression model (MLR). 

2749 Table 4.5 shows the number of outliers from each of the models. According to this 

2750 table, RT (1) followed by BT (2) and BT (1) and then I-tree (1) are the best 

2751 externally validated models with the lowest numbers of outliers in the validation 

2752 set. The advantage of RT is the obvious simplicity and interpretability which can 

2753 make it more popular with the end users in drug discovery disciplines. For example, 

2754 when using the tree for a new compound, the molecular descriptors used in the tree 

2755 will need to be calculated for the compound and then the terminal node (leaf) where 

2756 the compound falls according to the molecular descriptor values should be 

2757 identified. The average log BE% of the tenninal node (Mu) is the estimate of the 

2758 tree for this compound. Despite that RT provides discrete predictions of a 

2759 continuous observation which is not ideal, this is a much more straightforward 

2760 procedure than using BT or RF for the estimation of BE%. These models are 

2761 ensemble of many trees, and therefore the prediction has to be perfonned by the 

2762 computer rather than manually. 

2763 An interesting observation was made as MW and COOH were not significant in 

2764 MLR equation when forced into stepwise regression analysis (P > 0.05). Despite 

2765 this, incorporation of these two parameters was statistically significant in C&RT 

2766 analysis resulting in I-tree (1) and I-tree (2). This indicates the non-linear nature of 

2767 the impact of these two parameters on biliary excretion. Average prediction by the 

2768 three RT models was also considered and found to be of similar accuracy to RT (1) 

2769 (Table 4.5). 

2770 In this work, the MLR model based on the training set of 168 compounds had the 

2771 second poorest prediction accuracy after RF. Studies by Yang et al. (Yang et al., 

2772 2009) and Chen et al. (Chen et al ., 2010) report MLR models based on training sets 

2773 of 37 and 46 compounds, respectively. The proposed model by Yang et al. 

2774 incorporated molecular connectivity indexes and atom-type electrotopological 

2775 indexes which have also been used in this study. The model proposed by Chen et 
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2776 al. also incorporated similar molecular descriptors to our study, with the addition of 

2777 Abraham descriptors representing polarisability and hydrogen bond acceptor 

2778 capacity. Although we have not used Abraham’s descriptors, there are other 

2779 molecular descriptors in our set of 386 descriptors that measure the same 

2780 properties. Examples are the number of hydrogen bonding acceptor atoms and 

2781 atomic charge on the most negatively charged atom in the molecule which may 

2782 represent hydrogen bond acceptor ability (Dearden and Ghafourian, 1999) and 

2783 molar refractivity descriptors which may indicate molecular polarisability (Vennan 

2784 and Hansch, 2005). 

2785 In another study, Luo et al. (Luo et ah, 2010) used 50 proprietary compounds from 

2786 Bristol-Myers Squibb Co. for model development. They also developed a multiple 

2787 linear regression model, but in addition to more common molecular descriptors, 

2788 they employed free energy of aqueous solvation calculated from a self-consistent 

2789 reaction field method. In analysing this model, Gandhi and Morris (Gandhi and 

2790 Morris, 2012) found that the model failed to generalise further to the new set of 

2791 compounds and specifically free energy of aqueous solvation was not statistically 

2792 significant. They argued that a complex process such as hepatobiliary excretion 

2793 cannot be captured by simple physicochemical properties when examining 

2794 chemically dissimilar compounds. Indeed, such extrapolations to external 

2795 compounds will fail when the compounds are outside the domain of applicability of 

2796 the QSAR models. Incorporation of a larger dataset in this work may provide the 

2797 opportunity for capturing an extended chemical space. This will be discussed 

2798 further when analysing the outliers in the next two sections. 

2799 

2800 4.4.2. Structural Features of Compounds for Biliary Excretion 

2801 Table 4.2 gives a brief description of the significant molecular descriptors used in 

2802 the models. For the sake of this discussion, the descriptors in this work can be 

2803 classified roughly into five categories as follows: lipophilicity, ionisation, 

2804 molecular size and topological and constitutional descriptors. 

2805 It can be seen in Table 4.2 that lipophilicity descriptors such as log D at different 

2806 pH levels and surface area of hydrophilic molecules (SlogPJVSAO) are present in 


110 



2807 all models. In all interpretable models (except for the linear regression equation), 

2808 lipophilicity descriptors show a negative effect on the biliary excretion of 

2809 compounds. This may relate to the fact that highly lipophilic compounds are known 

2810 to be highly extracted and metabolised in the liver (Proost et al, 1997) rather than 

2811 being excreted unchanged through bile or kidney. For example, metabolism by 

2812 cytochrome P450 enzymes (Lewis and Ito, 2010) and (UDP)- 

2813 glucuronosyltransferase (Smith et al, 2003) is mainly controlled by lipophilicity 

2814 and increased for more lipophilic compounds. There have been inconsistent 

2815 findings in the literature regarding the effect of lipophilicity on the biliary excretion 

2816 of xenobiotics. Proost et al. found no significant correlation between lipophilicity 

2817 and biliary excretion of a series of bulky organic cations despite it being the 

2818 predominant factor for the degree of plasma protein binding and hepatic uptake rate 

2819 (Proost et al, 1997). Similar observations have been made for other compilations of 

2820 biliary excretion data (Yang et al, 2009). Other studies indicate negative effect of 

2821 lipophilicity on the biliary excretion within the range of compounds studied (Luo et 

2822 al, 2010; Varma et al, 2012). Lipophilicity has been associated with many models 

2823 of ADME properties (Hansch et al, 2004). It is a well-established fact that 

2824 compounds with higher logP have poor aqueous solubility and are more likely to 

2825 pass through lipid bilayer of biological membranes (Kems and Di, 2008). The 

2826 general trend in the literature with regards to the role of lipophilicity in 

2827 pharmacokinetic processes indicates that more lipophilic compounds have higher 

2828 oral absorption, plasma protein binding, and volume of distribution (van de 

2829 Waterbeemd et al, 2001; Obach et al, 2008; Newby et al, 2013b) and are more 

2830 prone to P450 metabolism (Lewis and Ito, 2010; van de Waterbeemd et al, 2001). 

2831 This may lead to the reduced chance of excretion through bile as the intact drug. 

2832 All models presented in this work indicate the significant role of ionisation and 

2833 polarity through molecular descriptors such as COOH, fU, FCASA- and 

2834 SddssS_acnt. Acids are able to ionise into anions which are substrates of several 

2835 transporters (generally organic anion transporters). Compounds that carry positive 

2836 as well as negative charge or partial charges can use both the ‘organic anion’ and 

2837 the ‘organic cation’ transport systems (Koepsell et al, 2001). For example, OAT3 

2838 accepts various kinds of bulky hydrophobic anions, while OAT1 can transport 

2839 relatively hydrophilic small molecules, such as nucleoside analogues (Maeda et al. 
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2840 2010). Besides, monocarboxylate transporters (MCT1 to MCT14) constitute a 

2841 family of proton-linked plasma membrane transporters that carry molecules having 

2842 one carboxylate group. MCT1 is expressed nearly all over in every tissue in the 

2843 human body and also in rat and calves hepatocytes (Kirat et al., 2007). MCT2 is 

2844 abundant on the surface of human, rat and hamster hepatocytes (Halestrap and 

2845 Meredith, 2004). MCT5 and MCT8 are also known to play transporting role in rat 

2846 hepatocytes (Halestrap and Meredith, 2004). Studies of biliary excretion of 

2847 exogenous compounds have indicated the relation between polarity and biliary 

2848 excretion stating that possession of a strongly polar anionic group was important 

2849 factor in appreciable biliary excretion (Luo et al., 2010; Millburn et al., 1967). In 

2850 all the interpretable models reported here, polarity descriptors show a positive 

2851 impact on biliary excretion. Examples are the positive coefficients of dipole 

2852 moment (AMl_dipole) in the linear regression equation and higher percent of 

2853 compounds with lower unionised fractions at pH 7.4 (fU) in RT (1). 

2854 Molecular size is the other important factor in biliary excretion represented in the 

2855 models by molecular descriptors such as kappa shape indexes, hydrophobic 

2856 volumes (vsurf_Wl and vsurf_W3) and surface areas of atoms with specific charge 

2857 or lipophilicity ranges (e.g. PEOE_VSA_NEG and PEOE_VSA_HYD). These 

2858 molecular descriptors show positive effect on biliary excretion level in all models. 

2859 This is in line with the common understanding that a molecular weight threshold 

2860 may apply to biliary excretion of compounds, and that high molecular weight 

2861 compounds may be predominantly excreted through bile (Yang et al., 2009; Vanna 

2862 et al., 2012; Millbum et al., 1967). Yang et al. (Yang et al., 2009) suggested a 

2863 molecular weight threshold value of 400 Da for biliary excretion of anionic drugs in 

2864 rats using 164 drugs. In this study, regression tree analysis found the threshold 

2865 value for molecular weight to be at 347.9 Da for biliary excretion in rat (I-tree (1)). 

2866 Incidentally, this regression tree had the second highest prediction accuracy for the 

2867 external validation set amongst the RT models. This was despite the fact that 

2868 molecular weight was not the descriptor of choice by C&RT analysis. 

2869 The incorporation of some structural fragments in the models gave interesting 

2870 infonnation regarding molecular requirements for biliary excretion. Examples 

2871 include SddssS acnt and SsssCH which indicate higher biliary excretion of 
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2872 compounds containing sulphate groups and non-branched structure (MLR). 

2873 Compounds containing carboxylic acid groups are also more likely candidates for 

2874 biliary excretion according to I-tree (2). Up to half of compounds in our dataset 

2875 contain -COOH groups (103 compounds out of 217). Sixty-five out of 103 COOH 

2876 containing compounds had biliary excretion of > 20%. Varma et al. (Varma et al., 

2877 2012) have analysed the interconnection between physicochemical requirements of 

2878 OATP substrates and the biliary excretion rates. It was then suggested that substrate 

2879 specificity of OATPs including acidity may primarily indicate the elimination 

2880 through bile (Varma et al., 2012). 

2881 4.4.3. Analysis of the Outliers 

2882 There are a number of compounds that are outliers from majority of the models. 

2883 Analysis of outliers may provide interesting information regarding the applicability 

2884 of the models. Within the BE% range, it could be observed that compounds with 

2885 low biliary excretion show a higher average error in general (Table 4.6). For 

2886 example, the average error by all seven models was the highest for the six 

2887 compounds with the extremely low biliary excretion (BE% < 0.23), followed by the 

2888 compounds with 0.23 < BE% < 1.23 (-0.64 < log BE% < 0.09). A closer inspection 

2889 of the data reveals that despite the high average error for the six compounds with 

2890 low biliary excretion, the estimation may still be acceptable as all these compounds 

2891 have been estimated to have a BE% value < 4% (average of all models) and below 

2892 0.6% by RT1 model with only one exception (benzoic acid). A hypothesis here 

2893 could be that these compounds may have suitable properties for higher biliary 

2894 excretion, but other routes of elimination are predominating. For example, it has 

2895 been shown for benzoic acid that when clearance by the kidney is prevented, biliary 

2896 excretion increases by 10% (Abou-El-Makarem et al, 1967). Out of 217 

2897 compounds in the dataset, the predominant routes of elimination are biliary 

2898 excretion for 115 compounds, renal excretion for 65 compounds and metabolism 

2899 for 37 compounds. However, the outlier compounds do not belong to any single 

2900 groups above in terms of the predominant routes of elimination (see Figure 4.9 for a 

2901 graph showing the predominant routes of elimination for the compounds in biliary 

2902 excretion dataset). 

2903 


113 



2904 Table 4.6. Average MAE by nine models for compounds with various BE%, logP 

2905 and molecular weight values 


BE% 

Average MAE 

n 

<=0.23 

1.12 

6 

0.23 -1.23 

0.50 

26 

> 1.23 

0.30 a 

176 

MW (Da) 

>280 

0.31 

173 

<= 280 

0.54 

35 

Log P 

>5.35 

0.63 

13 

<=5.35 

0.33 

195 


2906 


2907 


2908 

2909 

2910 

2911 

2912 

2913 

2914 


Predominant Route of 
Elimination 



■ Biliary 
Excretion 

■ Renal 
Excretion 

■ Metabolism 


Figure 4.9. The main routes of elimination for compounds in the biliary excretion 
dataset 

According to Table 4.6, highly lipophilic compounds (log P>5.35) and low 
molecular weight compounds (MW < 280) also show higher error rates, and this 
may need to be considered when using the models for the prediction of external 
compounds. 
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2915 Table 4.7 gives a list of the compounds that are outliers in six or seven models out 

2916 of the seven models proposed here. In addition, there are four compounds which 

2917 were outliers in four or five models but had exceptionally high average error from 

2918 the seven models. These compounds were part of the training or validation sets, but 

2919 none were omitted from average error calculations. 

2920 

2921 

2922 Table 4.7. Outlier compounds in training or validation sets with absolute error of > 

2923 0.6 in more than five out of seven models and their BE% values. 


Outliers 

BE% 

Log 

BE% 

Over or under 

prediction 

Models 

with error 

MW 

Benzoic acid 

0.09 

-1.07 

over-predicted 
except for BT 

4 

122 

EMDP 

0.20 

-0.69 

over-predicted 

6 

263 

Fosmidomycin 

0.10 

-1.00 

over-predicted 

7 

183 

Nelfinavir 

0.05 

-1.32 

over-predicted 

5 

567 

EDDP 

36.31 

1.56 

under-predicted 

6 

277 

PAEB 

31.62 

1.50 

under-predicted 

7 

222 

Tolrestat 

53.70 

1.73 

under-predicted 

6 

357 


2924 

2925 The outliers in Table 4.7 have been over- or under-predicted by the models. One 

2926 compound in the table has shown underestimation by some and overestimation by 

2927 other models; biliary excretion of benzoic acid was overestimated by all models 

2928 except for BT (1) and BT (2). It can be seen in Table 4.7 that fosmidomycin, 

2929 nelfinavir and 2-ethyl-5-methyl-3,3-diphenyl-l-pyrroline (EMDP) are over- 

2930 predicted by five or more models. Benzoic acid is rapidly cleared by the kidney, so 

2931 it may not have enough time to pass into the bile (Abou-El-Makarem el al, 1967). 

2932 Abou-El-Makarem and his colleagues examined this possibility by tying up the 

2933 renal pedicles in rats, so that clearance by the kidney was prevented, and the results 

2934 indicated that when clearance by the kidney is prevented, biliary excretion 

2935 increased by 10% (Abou-El-Makarem et al, 1967). Fosmidomycin has a short half- 

2936 life of 1.7 h and is rapidly cleared by the kidneys (Murakawa et al., 1982). It is a 

2937 small molecular weight polar agent which may not be cleared in high quantities 
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2938 through bile according to the molecular weight threshold hypothesis. Despite the 

2939 use of molecular size descriptors, this compound still appeared to be overestimated 

2940 by all seven models, even using I-tree (1) which has employed MW for the first 

2941 branching. The problem with I-tree (1) in relation to this compound is that although 

2942 this compound falls into node 2 along with 44 other low molecular weight 

2943 compounds, this node has an average log BE% of 0.42 which is much lower than 

2944 node 3 with an average log BE% of 1.27 but not low enough for this compound. 

2945 Likewise, other models have indicated low biliary excretion of small-sized 

2946 compounds, but somehow, estimation is higher than what is actually observed. 

2947 Nelfinavir has a half-life of 3.5 to 5 h and is eliminated via metabolism by the 

2948 cytochrome P450 enzyme system (Bardsleey-Elliot and Plosker, 2000). This is a 

2949 highly lipophilic compound which is poorly excreted through bile, and is predicted 

2950 as such by the models (predicted BE% below 2% using all models except for I-tree 

2951 (2) and RF which predict 13 and 7.6%, respectively). 

2952 EMDP is a major metabolite of methadone which has been over-predicted by most 

2953 models despite a very low biliary excretion. As with nelfinavir, the predicted BE% 

2954 for this compound by most models is quite low at < 4% (MLR is an exception) and 

2955 the selected model, RT (1), predicts a biliary excretion value of ~0.3%. Despite 

2956 this, in comparison with the extremely low observed value of 0.05%, the predicted 

2957 values are much higher, leading to a numerically large average error, even though 

2958 qualitatively, the predicted biliary excretion may be reasonably low. 

2959 EDDP, PAEB (procaine amide ethobromide) and tolrestat are the under-predicted 

2960 compounds. All these compounds have high BE% values at 36, 32 and 54%. This is 

2961 despite the relatively low molecular weights of EDDP and PAEB which are below 

2962 the defined MW threshold of 347 Da for biliary excretion. The exact mechanism of 

2963 high biliary excretion of these compounds warrants further investigation to explore 

2964 the reasons behind such high biliary excretion despite the low molecular weight. 

2965 Tolrestat has a relatively high molecular weight suitable for biliary excretion and a 

2966 COOH group making it a suitable substrate for OATPs (Varma et al., 2012). 

2967 Despite this, the hydrophilic volume calculated by the VolSurf descriptor vsurf-W3 

2968 is not high enough to put this compound in node 3 rather than node 2 of RT (1) 
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2969 model. In I-tree (1), the compound falls into node 16, which is due to the lack of 

2970 non-aromatic branched structure which would place it in node 17 with a higher 

2971 predicted BE%. Likewise, in I-tree (2), this compound fails to be placed in node 7 

2972 and falls in node 6 instead due to the low total negative charge (> -2.33) as a result 

2973 of the low number of negatively charged atoms. This indicates a shortcoming in the 

2974 abovementioned models which lack suitable parameters that can capture the relative 

2975 polarity in relation to the molecular size. 

2976 

2977 4.5. Conclusion 

2978 This investigation focused on the development of computational models for a cost- 

2979 effective estimation of biliary excretion of compounds. This was made possible 

2980 through the application of quantitative structure-activity relationships where 

2981 molecular properties (descriptors) of a large dataset of compounds were related to 

2982 the percentage of dose excreted intact via the bile through the use of statistical 

2983 techniques. Some of the statistical techniques led to very promising results as 

2984 evaluated by the prediction accuracy for the external validation set. The QSAR 

2985 models also identified the important molecular properties (descriptors) that have the 

2986 main influence on biliary excretion of compounds. The selected models were the 

2987 regression tree (C&RT) model, RT (1), followed by boosted trees models BT (1) 

2988 and BT (2). Regression trees also have the advantage of being simple, interpretable 

2989 and user-friendly. The models generally indicated that larger, relatively hydrophilic 

2990 molecules containing a carboxylic acid group are more prone to biliary excretion. 

2991 For example, in the selected model, RT (1), compounds with increased hydrophilic 

2992 volume and acidic dissociation have high biliary excretion. The significance of 

2993 acidity and molecular size were further confirmed through interactive regression 

2994 trees and a statistically validated MW threshold for effective biliary excretion was 

2995 established. Detailed analysis of the error levels and outliers indicated that the 

2996 models work best for larger compounds (MW >280 Da) and are less accurate for 

2997 extremely lipophilic compounds (log P > 5.35). 

2998 
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3001 5. Effect of P-gp Binding on Biliary Excretion 

3002 

3003 5.1. Introduction 

3004 One in four deaths in the United States is due to cancer and recently the American 

3005 Cancer Society reported a total of 1,660,290 new cancer cases and 580,350 cancer 

3006 deaths are projected to occur in the United States in 2013 (Siegel et al, 2013). The 

3007 failure of cancer treatment can be attributed to a variety of different 

3008 pharmacological and clinical reasons; but one major cause of the treatment failure 

3009 is multidrug resistance (MDR) to chemotherapeutics (Song et al, 2010). MDR 

3010 mechanisms can result in resistance to a number of structurally and functionally 

3011 unrelated chemotherapeutic agents. The multidrug resistance behaviour is mainly 

3012 li nk ed to the activity of transmembrane efflux pumps such as P-glycoprotein 1 (P- 

3013 gp/ABCBl), breast cancer resistance protein (BCRP/ABCG2) and multidrug 

3014 resistance-associated protein 1 (MRP1/ABCC1), which are members of ATP- 

3015 Binding Cassette transporter family (Krishna and Mayer, 2000). P-gp, also known 

3016 as multidrug resistance protein 1 (MDR1), is a well-studied glycoprotein which was 

3017 first discovered in 1976 by surface labelling studies in multidrug resistant Chinese 

3018 hamster ovary cells (Juliano and Ling, 1976). Since then, it has demonstrated its 

3019 function as a transporter of hydrophobic drugs, lipids, steroids and metabolic 

3020 products. 

3021 Overexpression of P-gp in cancer cells contributes significantly to the resistance of 

3022 cancer cells against chemotherapeutic agents (Gottesman, 2002). As a strong efflux 

3023 pump, P-gp is able to export a number of structurally diverse anticancer agents 

3024 including anthracyclines, epipodophyllotoxins and vinca alkyloids. As a result, P- 

3025 gp has been suggested as a viable target to be inhibited in the treatment of 

3026 multidrug resistant cancer (Szakacs et al, 2006). Drugs such as actinomycin-D and 

3027 azithromycin can strongly block the P-gp and limit the efflux of P-gp substrates. 

3028 Inhibitors that block the transport of chemotherapeutics or other compounds may 

3029 act as competitive or non-competitive inhibitors (Ambudkar et al, 1999). In recent 

3030 years, the inhibitory activity against P-gp has been tested in many compounds in 
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3031 order to overcome P-gp mediated resistance of cancer cells to the 

3032 chemotherapeutics (Pajeva et al., 2009). 

3033 In addition to its role in multidrug resistance, P-gp has a profound role in 

3034 pharmacokinetics, affecting drug absorption, distribution and excretion (Lin and 

3035 Yamazaki, 2003). It is found in high amounts at the apical surface of epithelial cells 

3036 lining the colon and small intestine, hepatocytes, pancreas ductules, proximal 

3037 tubules in kidneys, and the adrenal gland (Schinkel and Jonker, 2003; Dean, 2002). 

3038 P-gp is also known to play a major role in transporting compounds out of the brain 

3039 in the blood brain barrier (Mahno et al., 2013). In the BBB, only suitably lipophilic 

3040 compounds can diffuse across the endothelial cells and enter the brain. However, a 

3041 high proportion of P-gp that surrounds this area of the brain prevents their 

3042 accumulation by distributing substrates back into the blood circulation (Mahno et 

3043 al., 2013). In the gastrointestinal tract and in hepatocytes, P-gp is responsible for 

3044 the efflux of drugs back into lumen/bile, thus reducing the bioavailability of 

3045 substrate drugs (Giacomini et al., 2010). Similarly, in kidneys, P-gp is located 

3046 primarily in glomerular mesangium and the apical membrane of proximal tubule 

3047 epithelia and plays a significant role in the tubular secretion of organic cations 

3048 (Giacomini et al., 2010). 

3049 As stated earlier, P-gp is poly-specific and can efflux a very broad range of 

3050 substrates. The substrates can have molecular weights ranging from 250 to 1850 

3051 Da, different ionization states, acid/base properties, hydrophobicities or 

3052 amphipathic properties (Kerns and Di, 2008). There are drugs and herbal products 

3053 that can affect the function of P-gp transporters and the number of drugs that are 

3054 found to be the P-gp substrates is incessantly growing. For instance, rifampin (an 

3055 antituberculosis drug) induces the intestinal expression of P-gp (Ehrhardt and Kim, 

3056 2008). Due to the broad substrate specificity of P-gp, drug-drug interactions 

3057 involving P-gp are very likely (Lin, 2003). Drug-drug interaction is an important 

3058 issue observed in cancer patients, especially because they often receive multiple 

3059 medications concurrently with complex chemotherapy regimens (Wong et al., 

3060 2008). Due to the importance of P-gp in drug interaction, the FDA has urged that 

3061 every new molecular entity should be routinely checked for a possible interaction 

3062 with P-glycoproteins (FDA Guidelines, 2014). 
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3063 Multiple binding sites are available for P-gp. Generally, P-gp inhibition can happen 

3064 in three different ways. Firstly by reversibly blocking the binding of substrate drugs 

3065 this can be allosteric, competitive or non-competitive. In competitive inhibition, 

3066 inhibitor is a structural analogue of substrate, and competes with substrate to bind 

3067 to the active site of the enzyme. The decreased activity observed is due to decreased 

3068 in number of enzyme-substrate complexes formed. Competitive inhibition is totally 

3069 reversible by adding excess amount of the substrate. The relative concentrations of 

3070 substrate and inhibitor and their respective affinities to the enzyme determine the 

3071 degree of competitive inhibition (Raju). A non-competitive is one that display 

3072 binding affinity for both the free enzyme and the enzyme-substrate complex. In this 

3073 mode, the binding affinity cannot be defined by single equilibrium dissociation 

3074 constant but with two dissociation constants, one for the binary enzyme-inhibitor 

3075 complex (K,) and one for the ternary enzyme-substrate-inhibitor complex 

3076 (Copeland, 2005). Uncompetitive inhibition is less common and can be detected by 

3077 plotting, which means the inhibitor dose not bind with free enzyme but binds only 

3078 to the enzyme-substrate complex. Inhibition of placental alkaline phosphatase by 

3079 phenylalanine is an example of uncompetitive inhibition (Raju). 

3080 Secondly by acting with ATP hydrolysis site, due to the fact that P-gp is inactive 

3081 when ATP hydrolysis site is blocked (Shapiro and Ling, 1997; Urbatsch et al., 

3082 1995). Although majority of drugs block the P-gp by blocking the substrate binding 

3083 sites (Vanna et al., 2003), presence of multiple binding sites should be considered 

3084 in the substrate or inhibitor studies. Besides, P-gp may be induced by various 

3085 agents such as ritonavir (Perloff et al., 2001). 

3086 Numerous well-known multispecific drug transporters are involved in liver 

3087 canalicular efflux of many xenobiotics (Pfeifer et al., 2014). Of these transporters, 

3088 P-gp characterises as the most widely studied efflux transporter in biliary excretion. 

3089 This transporter is responsible for transporting of mainly large lipophilic and 

3090 cationic substrates into the bile canalicular (Oza, 2002). It has been shown in 

3091 genetically modified mice lacking mdrl-type (drug-transporting) P-gp that substrate 

3092 drugs such as digoxin may have a reduced elimination (Schinkel et al., 1997). 

3093 Moreover, mutations in the human MDR3 gene responsible for P-gp lead to 

3094 progressive familial intrahepatic cholestasis which lack biliary phospholipid 
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3095 excretion (de Vree et al., 1998). Another example regarding the importance of P-gp 

3096 in biliary excretion of drugs is the P-gp substrate imatinib, which shows a 

3097 significantly reduced fecal excretion in P-gp knockout mice or in the presence of P- 

3098 gp inhibitors (Oostendorp et al., 2009). 

3099 Given the important clinical relevance of P-gp, it is important to elucidate the mode 

3100 of interaction with the modulators and substrates of this enzyme. Higginis and 

3101 colleagues suggested a model for the P-gp polyspecificity namely “hydrophobic 

3102 vacuum cleaner” model (Higgins and Gottesman, 1992). In the proposed model, the 

3103 hydrophobic substrates enter the transmembrane domain of P-gp and are 

3104 transported outside the cell. A recent study by Aller et al (Aller et al., 2009) 

3105 provided a detailed structural description of mouse P-gp, which indicates a 

3106 substantial internal cavity comprising mostly hydrophobic and aromatic residues. 

3107 Despite the substrate promiscuity, several studies have been valuable in identifying 

3108 structure activity relationships for the modulators. Evidences from X-ray 

3109 crystallography (Aller et al., 2009), chromatography (Lu et al., 2001) and several 

3110 biochemical techniques (Martin et al., 2000; Maki et al., 2006) suggest the presence 

3111 of multiple substrate-binding sites and a number of inhibition mechanisms, which 

3112 may be the cause of substrate promiscuity. As a result, it may be necessary to 

3113 generate more than one phannacophore for P-gp (Ekins and Erickson, 2002). 

3114 The type of the quantitative data available for P-gp is mostly in terms of IC 50 values 

3115 for the inhibitors. On the other hand very few substrate Km measures are found in 

3116 the literature, despite the availability of binary data of substrate/non-substrate 

3117 (Matsson et al., 2009). As a vast majority of the reported IC50 values are for 

3118 compounds that also act as substrates, with the exception of flavonoids which are 

3119 believed to be able to bind to the ATP site as well as the substrate binding site 

3120 (Kim, 2002), the inhibition constants may also indicate the binding capacity of the 

3121 compounds. As a result, in this investigation, the IC 50 and K, values were collated 

3122 for the QSAR studies. The use IC 50 (concentration of inhibitor required for 50% 

3123 inhibition) has the disadvantage of not allowing easy comparison of data from 

3124 different substrate conditions. Unlike IC 50 , the inhibition constant, K;, is a more 

3125 universal parameter that is standardised according to the substrate concentration 
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3126 and Km values (Cheng and Prusoff, 1973). A K, value is related to enzyme- 

3127 inhibitor complex and explains the strength of the interaction. 

3128 The broad aim of this investigation was to study the effect of P-gp binding on the 

3129 QSAR models for the estimation of P-gp. To achieve this, first, several data mining 

3130 techniques were used to enable development of universal models for the prediction 

3131 of P-gp inhibition constant (K;). In enzyme-inhibitor binding equilibria, the 

3132 enzymatic reaction begins with the reversible binding of substrate to the free 

3133 enzyme to form the enzyme-substrate complex, as quantified by the dissociation 

3134 constant (K s ). The enzyme-substrate complex thus fonned goes on to generate the 

3135 reaction product through a series of chemical phases that are collectively defined by 

3136 the first-order rate constant. The first mode of inhibitor interaction that can be 

3137 considered is one in which the inhibitor binds to the free enzyme in direct 

3138 competition with the substrate. The equilibrium between the binary enzyme- 

3139 inhibitor complex and the free enzyme and inhibitor molecules is defined by 

3140 dissociation constant (K;) (Copeland, 2005). In these models, the use of molecular 

3141 descriptors for the substrates in addition to the inhibitor parameters may be useful 

3142 for splitting of the K; data if the substrate type has an effect on the measured Ki 

3143 values. Secondly, docking scores were investigated as a complementary parameter 

3144 to investigate the significance of interaction energy between the ligands and P-gp in 

3145 the models for estimation of the binding constants. Third, the selected QSAR 

3146 models were used for the prediction of P-gp binding constants of the compounds in 

3147 biliary excretion dataset. Finally, the predicted P-gp dissociation constant (briefly 

3148 log K;) values were used as predictors in the QSAR models for the prediction of 

3149 biliary excretion. 

3150 5.2. Methods 

3151 5.2.1. P-gp Dataset 

3152 IC 5 o and K, values for P-gp inhibitors were collated from the literature (Cook et al, 

3153 2010; Choo et al, 2000; Dantzing et al., 1996; Eberl et al., 2007; Ekins et al., 

3154 2002a; Ekins et al., 2002b; Eriksson et al., 2006; Kakumoto et al, 2002; Katoh et 

3155 al., 2001; Keogh et al., 2006; Lan et al., 1996; Lumen et al., 2010; Luo et al., 2002; 

3156 Matsson et al, 2009; Neuhoff et al, 2000; Noguchi et al., 2009; Pauli-Magnus et 
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3157 al, 2000; Petri et al, 2004; Rautio et al, 2006; Richter et al, 2009; Shaik et al., 

3158 2007; Tang et al, 2002a; Tang et al, 2002b; Wandal et al, 1999 and Wang et al, 

3159 2001). IC 5 o values of P-gp inhibitors were used to calculate the K ; values using the 

3160 Cheng-Prusoff equation below. 

3161 K i =-^ T Eq ( 1 ) 

1+-LJ- 

K m 

3162 In this equation [S] is the substrate concentration and K m is Michaelis-Menten 

3163 constant for the substrate (the concentration of substrate at which enzyme activity is 

3164 at half maximal). If K m values for the substrates were not reported in the 

3165 publication, then they were obtained from the authors through personal 

3166 communication. The rationale behind converting the IC 50 values to K; values is that 

3167 the K; is a more universal scale, which in theory should be independent of the 

3168 substrate used. 

3169 In case there were several IC 50 /Ki values available for a single inhibitor from 

3170 different sources, the average K; values were used, unless the probe substrate was 

3171 different. If there was a significant difference in the reported IC 50 /Ki values, we 

3172 contacted the authors to find out if they could provide an explanation for the 

3173 observed differences before using the reported values. In total the dataset consisted 

3174 of K; values for 219 unique inhibitor/substrate pairs, with data measured in different 

3175 cell systems including Caco-2, MDCK-MDR1, MDCK II-MDR1, K562-MDR, 

3176 MDR1 transfected LLC-PK1 and P388 lymphoma cells. Human colon carcinoma 

3177 cell line (Caco-2) and Madin-Darby canine kidney cells (MDCK) were the most 

3178 common cell line used in our dataset. The inhibitors in the dataset are from 

3179 different chemical/pharmacological classes such as anticancer and anti-HIV agents, 

3180 statins, antiretrovirals, cephalosporines, ergopeptides, antipsychotics, opiods, 

3181 NSAIDs, analgesics, and antiarithmetic drugs. The dataset is presented in 

3182 Appendix II. 

3183 
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3184 5.2.2. P-gp-Ligand Docking 

3185 Docking energy for all inhibitors was calculated using MOE software (version 

3186 2012.10, Chemical Computing Group Inc. Montreal, Canada). Later, the docking 

3187 score of inhibitors were used as an additional molecular descriptor by adding these 

3188 score’s columns to the dataset. 

3189 The X-ray structure of the mouse P-gp was obtained from the protein data bank 

3190 (PDB code 3G60) [http://www.rcsb.org]. The use of this PDB structure was due to 

3191 a previous docking investigation that showed better scoring poses using mouse 

3192 3G60 structure in comparison with the other two mouse P-gp structures (PDB 

3193 codes: 3G61 and 3G5U), or the human homology model of P-gp (Loschmann et al., 

3194 2013). It should be noted that this structure of mouse P-gp was co-crystallised with 

3195 a ligand and the complex had two stereo-isomers of cyclic hexapeptide inhibitors, 

3196 cyclic-tris-(R)-valineselenazole (QZ59-RRR) and cyclic-tris-(S)-valineselenazole 

3197 (QZ59-SSS) in the active site (Aller et al., 2009). The protein was protonated and 

3198 protonatable residues were titrated using default parameters of the software before 

3199 the docking exercise. Molecular structures of the ligands (P-gp inhibitors) were 

3200 optimised after atomic charge calculation using SCF optimization (AMI 

3201 Hamiltonian). In enzyme-ligand docking, default parameters of the software were 

3202 used for ligand interactions. These are energy cut-off for H-bond and ionic 

3203 interactions of -0.5 kcal/mol and maximum distance for non-bonded interactions of 

3204 4.5 A. In the MOE dock panel, the placement method was Triangle Matcher, the 

3205 scoring methodology was set to London dG as the first and the second scoring 

3206 functions, the refinement methodology was set to Forcefield, and finally, the 30 

3207 best scoring poses, the mean energies and the mean energies and backbone root 

3208 mean square deviation (RMSD) were retained. The binding site was defined in 

3209 MOE software using the co-crystallised ligand QZ59-RRR. 

3210 Preparation of compounds for Docking 

3211 Before docking could take place, the SDF file was imported into the MOE 

3212 software. MOE is a suite of applications that can be used to manipulate and analyse 

3213 a collection of compounds. For docking to work efficiently, it is essential that each 

3214 structure is in a fonn suitable for it to be docked to a ligand. As a result, the 
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3215 software’s ‘Wash’ application was used to clean the structures and neutralise the 

3216 protonation state of each compound. This will neutralise all atoms and fonn the 

3217 structure of the compound in its least charge-bearing state. The next step was to 

3218 lower the potential energy of the structures. This was completed using the “Energy 

3219 minimize” function from the software. The compounds in the database were now 

3220 ready to be computed and molecular descriptors were calculated. 

3221 Validation of docking experiment 

3222 The published X-ray crystallography structures (Aller et al., 2009, Gutmann et al., 

3223 2010) were used to validate our docking model by comparing the geometries of the 

3224 docked Abcbla/QZ59-RRR structure and the structure of the Abcbla/QZ59-RRR 

3225 complex from X-ray crystallography and measuring root-mean-square deviation 

3226 (RMSD) between them. 

3227 

3228 5.2.3. Model Development and Validation 

3229 Development of models for P-gp 

3230 To perfonn QSAR analyses, P-gp inhibitors were divided into validation and 

3231 training sets. To divide the inhibitors, they were ordered with ascending K, values, 

3232 and then from every five compounds, four were allocated into the training and one 

3233 into the validation set randomly. This ensured similar K; ranges for the validation 

3234 and training sets. In this way, training data consisted of 176 compounds and 

3235 external validation set consisted of 43 compounds. 

3236 In this study, QSARs were established to relate the P-gp binding effect of 

3237 compounds (log K,) to the molecular descriptors and P-gp docking scores. 

3238 Molecular descriptors were calculated according to the procedures explained in 

3239 section 3.1. Before building the models, the molecular descriptors were checked to 

3240 find and discard those columns containing more than 98% constant values or more 

3241 than 10% missing values. The total number of molecular descriptors used in all 

3242 statistical analyses was 388. 
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3243 STATISTICA Data Miner version 11 was used for the statistical analysis. 

3244 Statistical methods consisted of decision tree methods and ensemble methods 

3245 including Classification and Regression Tree (C&RT), Chi-square Automatic 

3246 Interaction Detector (CHAID), Boosted Trees (BT) and Random Forest (RF). 

3247 Moreover, Multivariate Adaptive Regression Splines (MARS) model was also 

3248 developed. These methods have been explained in Chapter 3. Log K; was the 

3249 dependent variable and the predictors were selected by the embedded feature 

3250 selection methods in C&RT, CHAID, BT and RF from all the molecular descriptors 

3251 and docking scores available for the inhibitors and substrates. In C&RT analysis, 

3252 several stopping criteria were examined, including the default settings in 

3253 STATISTICA. The default stopping criteria were minimum number of cases of 24 

3254 to allow further splitting, and the maximum number of nodes set to 100. The V- 

3255 values of 10 or seven was used in the V-fold cross-validation. In CHAID analysis, 

3256 STATISTICA default setting for stopping criteria were used, including minimum 

3257 number of cases for splitting of 22, maximum number of nodes of 1000, probability 

3258 for splitting of 0.05 and probability for merging of 0.05. In BT analysis, the default 

3259 values for learning rate, the number of additive terms, random test data proportion 

3260 and subsample proportion were 0.1, 200, 0.2 and 0.5 respectively. Various 

3261 subsample proportions of 0.45, 0.50, 0.55 and 0.60 were also examined in 

3262 combination with the learning rates of 0.10, 0.03, 0.05 and 0.08. In RF analysis, 

3263 various subsample proportions of 0.45, 0.50, 0.55 and 0.60 were examined. The 

3264 random test data proportion was 0.3 for the internal validation and number of trees 

3265 was 100. The default settings were used for stopping conditions including minimum 

3266 number of cases, maximum number of levels, minimum number in child node and 

3267 the maximum number of nodes of 5, 10, 5 and 100, respectively. 

3268 For the development of MARS model, several pre-processing feature selection 

3269 techniques were examined. Feature selection methods were a Chi-square method as 

3270 implemented in STATISTICA vll (StatSoft Ltd.) developed by Hill and Lewicki 

3271 (Hill and Lewicki, 2006), stepwise regression analysis, and variable importance 

3272 rank from random forest and boosted trees analyses. The Chi-square-based feature 

3273 selection in STATISTICA picks a subset of descriptors from the descriptor pool 

3274 without assuming that the relationships between the predictors and the dependent 

3275 variables are linear or even monotone. In this feature selection, the range of 
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3276 continuous variable values was divided into 10 intervals. The best variables picked 

3277 by STATISTICA feature selection, the best descriptors selected by stepwise 

3278 regression analysis, as well as the top 5, 10, 15, 20 and 25 descriptors picked by 

3279 RF, and the top 5, 10 and 15 descriptors picked by BT were examined in separate 

3280 MARS analyses and the resulting models were compared. In MARS analysis, the 

3281 default model specifications for maximum number of basis functions, degree of 

3282 interactions, penalty and threshold were 21, 1,2 and 0.0005 respectively. 

3283 The best model from each analytical method was selected based on the perfonnance 

3284 indicators for the internal validation set. 

3285 

3286 Development of models for biliary excretion incorporating predicted P-gp activity 

3287 The selected P-gp dissociation constant (IQ) models above were used to predict the 

3288 log K; values for compounds in biliary excretion dataset (n = 217). QSAR models 

3289 were developed for biliary excretion using the dataset and methods explained in 

3290 Chapter 4. In addition to the molecular descriptors, the P-gp effects predicted by the 

3291 selected models from section 5.2.3 were used as the independent variables of the 

3292 analyses. In addition to stepwise regression analysis, C&RT, boosted trees and 

3293 random forest methods, two additional methods, CHAID, and MARS, were also 

3294 used for development of QSARs for biliary excretion using the procedure explained 

3295 above for P-gp models. In some C&RT models, the predicted IQ effects were 

3296 manually incorporated in the models, when they were not picked by C&RT feature 

3297 selection automatically. 

3298 

3299 5.3. Results 

3300 This chapter will present the results of QSAR development for P-gp binding 

3301 followed by the QSAR models for biliary secretion that incorporate predicted p-gp 

3302 binding values as molecular descriptors. 

3303 
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3304 5.3.1. Modelling the P-gp Dissociation Constant (Ki) 

3305 P-gp is an important polyspecific transporter protein that can significantly affect the 

3306 phannacokinetics of various pharmaceuticals as well as the effectiveness of 

3307 chemotherapeutics. Due to the major effect of P-gp efflux system in biliary 

3308 excretion of compounds, it is important to investigate the structural requirements 

3309 for P-gp binding and predict the binding constants using QSAR. In this 

3310 investigation, a large dataset of inhibition constant was collated to investigate the 

3311 development of a universal model for P-gp binding. To help overcome the problem 

3312 of heterogeneity of the data from various laboratories, that incorporate various 

3313 substrates at differing concentrations in the design of their experiments, several 

3314 strategies were implemented. First, the IC 50 values were converted to K, values, 

3315 which is a more comparable measure of inhibitory activity. Secondly, the molecular 

3316 descriptors of the probe substrates were also used in the analyses and model 

3317 development process. Third, docking scores from ligand-P-gp docking experiments 

3318 were incorporated as a molecular descriptor to aid the prediction accuracy of the 

3319 models. Fourth, the non-linear decision trees and MARS methods were employed 

3320 that are flexible; therefore, in theory they should be able to deal with more 

3321 heterogeneous data. 

3322 

3323 5.3.1.1. P-gp Ligand Docking 

3324 Docking energy for all compounds was calculated using MOE software and was 

3325 used as a molecular descriptor. First in order to verify the docking methodology 

3326 using MOE software, the geometries of the docked P-gp/QZ59-RRR and P- 

3327 gp/QZ59-RRR complexes from X-ray crystallography were compared and RMSD 

3328 between them was calculated. The RMSD value for this structure after 

3329 superimposing the docked and co-crystal structures was 0.77; the absolute RMSD 

3330 range without superposing was 0.89-6.2 for the top 30 poses. 

3331 Figure 5.1 shows the 3D structure of P-gp using MOE software. An example 

3332 substrate can be seen in yellow at the internal cavity corresponding to QZ-RRR 

3333 binding site. 
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3334 

3335 Figure 5.1. Ribbon drawing (front stereo view) of mouse P-gp (PDB id: 3G60) 3D 

3336 structure in MOE screen shot. The yellow bulb at the lower parts represents the 

3337 potential binding residues of mouse P-gp in the internal cavity. QZ59-RRR binding 

3338 site is located in binding pocket in lower side of P-gp cavity. Spiral alpha traces and 

3339 beta-sheet of P-gp present in red and yellow respectively. 

3340 

3341 Examples of docking results 

3342 Below are examples of P-gp docking of two P-gp substrate/inhibitors namely BMS- 

3343 387032 (Figure 5.2 and Table 5.1) and SNS-032 (2D diagram is presented in Figure 

3344 5.3 and 3D diagram is presented in Figure 5.4). These two compounds have been 
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3345 assessed as potential drugs in multidrug resistant cancer treatment (Michaelis et al., 

3346 2014; Loschmann et al., 2013). 



3348 Figure 5.2. The docked conformation of BMS-387032 in the binding pocket of 

3349 mouse P-gp with the lowest docking energy; blue arrows are strong hydrogen bonds 

3350 (limited within 4.5 A) between residues of Ser725 and Gln721 and nitrogen in 

3351 thiazole and piperidine respectively. Val978 and Phe974 are other residues with pi- 

3352 H and pi-pi interactions with the BMS-387032 respectively. 

3353 Table 5.1. Ligand interactions parameters for binding of BMS-387032 to mouse P- 

3354 gp (3G60) at the QZ59-RRR binding site (first docking pose) 


Fragment of Ligand 

Receptor 

Interaction 

Distance (A) 

E (kcal/mol) 

Nitrogen in Thiazole 

SER725 

H-acceptor 

3.47 

-0.7 

Piperidine 

GLN721 

H-acceptor 

3.10 

-1.9 

Thiazole 

VAL978 

pi-H 

3.44 

-0.9 

Thiazole 

PHE728 

pi-pi 

3.92 

-0.0 

Oxazole 

PHE 974 

pi-pi 

3.68 

-0.0 


3355 

3356 
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3369 

3370 

3371 


/Phe\ 

viv 



Figure 5.3. 2D graph of interaction of SNS-032 with the QZ59-RRR binding site of 
P-gp using MOE software; the diagram indicates the polar and non-polar 
interactions by pink or green coloured amino acids; hydrogen bonding is indicated 
by green dotted arrows and Pi-H interactions with green dotted line. In this 
diagram, the energy cut-off for H-bond and ionic interactions were -0.5 kcal/mol 
and the maximum distance for nonbonded groups was 4.5 A. Proximity contour are 
dotted lines surrounding the ligand and indicate the shape of the binding site and 
available space to the more outward-facing parts of the ligand. Blue shadows in 
some amino acids indicate the receptor exposure differences by the size and 
intensity of the quoits discs. The directions of the shadow indicate the directions of 
the amino acids towards the ligands. The blue clouds around the ligand atoms 
indicate the solvent exposure. 
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Figure 5.4. 3D diagram of the interaction of SNS-032 with QZ59-RRR binding site 
of P-gp; the pocket surface is mostly hydrophobic (green colour) and it matches 
well with hydrophobic rings of the ligand. 


8653 poses were obtained after P-gp docking with 219 compounds and the top pose 
docking energy for each ligand was used as an additional descriptor. The docking 
study of P-gp inhibitors was carried out using 3D structures of mouse P-gp (Aller et 
al. t 2009). 


5.3.1.2. QSAR Models for P-gp Binding 

Various decision trees and ensemble models as well as Multivariate Adaptive 
Regression Splines (MARS) model were developed for the prediction of P-gp 
inhibition constant. Table 5.2 summarises the selected models developed using 
various statistical methods. All models obtained are cross-validated and pruned 
automatically, and the selected models are those with the lowest standard error for 
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3388 the internal and external test sets. Models listed in Table 5.2 are results of various 

3389 feature selection and data analysis methods. Majority of these models can be easily 

3390 interpreted in terms of the molecular characteristics required for an effective P-gp 

3391 inhibitor. Here we provide a brief description of the models and the inferred 

3392 molecular characteristics. The molecular descriptors employed in these models 

3393 have been described in Table 5.3. 

3394 Table 5.2. Standard error for the training and internal test sets for the selected P-gp 

3395 models 


Model 

Descriptors 

supplied 

Descriptors 

incorporated 

manually 

Group 

Risk 

Estimate 

Standard 

Error 

RT (2) 

All 

descriptors 


Train 

0.246 

0.028 

Test 

0.810 

0.118 

CHAID (1) 

All 

descriptors 


Train 

0.420 

0.054 

Test 

0.672 

0.077 

I-tree (3) 

All 

descriptors 

Docking 

energies 

Train 

0.448 

0.050 

Test 

0.785 

0.148 

BT (3) 

All 

descriptors 


Train 

0.146 

0.013 

Test 

0.572 

0.126 

RF (2) 

All 

descriptors 


Train 

0.438 

0.057 

Test 

0.607 

0.127 

MARS (1) 

Selected 

descriptors 


Train 

- 

0.048 

Test 

- 

0.128 


3396 

3397 

3398 Table 5.3. A brief description of the most important molecular descriptors selected 

3399 and used by the models. 


Descriptor 

Model 

Description 

balabanJ 

RT (2) 

Balaban averaged distance sum connectivity index 

b double 

RT (2) 

Number of double bonds. 

Docking energy 
(MOE) 

I-trees (3) 

Docking score (kcal/mol)for enzyme-ligand docking of 
the compounds into the active site of P-glycoprotein 
(Aller et al., 2009) calculated using MOE software 

GCUTSMR2 

BT (3) 

The GCUT descriptors using atomic contribution to 
molar refractivity (4 descriptors). 

GCUTSMR3 

MARS (1) 

See GCUT SMR 2. 

logP (o/w) 

MARS (1) 

Log of the octanol/water partition coefficient. 

Num Rings 3 

CHAID (2) 

Number of rings 3 
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oprleadlike 

CHAID (2) 

This is one if and only if there are fewer than two 
violations from Oprea’s lead like rules, otherwise zero 

PEOE_VSA+0 

RT (2) 

van der Waals surface area of atoms with atomic charge 
in the range [0.00,0.05). 

PEOE VS A HY 

D 

RT (2), 
MARS (1) 

Total hydrophobic van der Waals surface area 

SaaN acnt 

CHAID (2) 

Count of all E-states for aromatic nitrogen atoms 

SdsCH 

CHAID (2) 

Sum of all (H-C=) E-State value in molecule. 

S-FRB 

BT (3) 

The number of free rotatable bonds in a substrate. 

S-HAcceptors 

BT (3) 

The number of hydrogen bond acceptors in substrate. 

SHBint4 Acnt 

CHAID (2) 

Sum of H-bond donors and acceptors indexes separated 
by four skeletal bonds 

S-LogD(2) 

RT (2) 

Logarithm of distribution coefficient D of a substrate 
between octanol and buffer layers at pH value 2.0. 

SlogP 

RT (2), 
CHAID (2), 
RF (2) 

octanol/water partition coefficient 

S-logP 

MARS (1), 
I-tree (3) 

octano 1/water partition coefficient in substrates. 

SMR_VSA2 

RT (2) 

Sum of approximate accessible van der Waals surface 
area for atoms with atomic contribution to molar 
refractivity in (0.26,0.35]. 

SMR_VSA4 

MARS (1) 

Sum of approximate accessible van der Waals surface 
area for atoms with atomic contribution to molar 
refractivity in (0.39,0.44]. 

S-PSA 

BT (3), 
MARS (1) 

The substrate polar surface area. 

SssCH2 

RT (2) 

Count of all CH2 groups E-state values in molecule. 

SssS_acnt 

CHAID (2) 

Count of all sulphur atoms (SssS) E-state values in 
molecule. 

SsssN 

BT (3) 

Atom-type electrotopological index for tertiary 
ammonium groups. 

Substrate 

CHAID (1) 

P-gp substrate 

vsurf_CW3 

RF (2) 

Capacity factor is the ratio of the hydrophilic surface 
over the total molecular surface, calculated at eight 
different energy levels (from -0.2 to -6.0 kcal/mol) 

vsurf_CW4 

I-tree (3) 

See vsurf_CW3. 

vsurf_D2 

MARS (1) 

Hydrophobic volume at -0.4 kcal/mol 

vsurf D4 

RF (2) 

Hydrophobic volume at -0.8 kcal/mol 

vsurf_D7 

RF (2) 

Hydrophobic volume at -1.4 kcal/mol 

vsurf_D8 

RT (2), 

RF (2) 

Hydrophobic volume at -1.6 kcal/mol 

vsurf_DW13 

I-tree (3) 

Contact distances of the lowest hydrophilic energy 
descriptors (vsurf EWmin). 

vsurf EWmin2 

MARS (1) 

Second lowest hydrophilic energy 

vsurfR 

RF (2) 

The surface rugosity related to hydrophobicity volume 
of an agent (The smaller the ratio, the larger is the 
rugosity). 
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vsurf_W4 CHAID (2) Hydrophilic volume. 

3400 ^ 

3401 

3402 5.3.1.2.1. Regression Trees 

3403 Figures 5.5 and 5.6 show the regression trees obtained using RT and CHAID (1) 

3404 respectively. In the regression trees, N is the number of P-gp inhibitors, Mu is the 

3405 average and Var is the variance of log K; in each node. It can be seen in Figure 5.5 

3406 of the RT model that the molecular descriptor selected by C&RT algorithm for the 

3407 first split of the data is SlogP (octanol/water partition coefficient). The tree 

3408 indicates that compounds with lower lipophilicity than SlogP=3.179 are less potent 

3409 inhibitors of P-gp with average log K ; of 1.90. This group of compounds (node 2) 

3410 may be considered as non-inhibitors, although further splitting in the tree indicates 

3411 a group of compounds with large non-polar surface area (PEOE VSA+0 > 75.6) 

3412 and more than three double bonds to be reasonably good inhibitors (node 37). On 

3413 the other hand, potent inhibitors are very lipophilic (node 3) especially those having 

3414 a Balaban topological index (balabanJ) of < 0.977. This is in agreement with 

3415 previous studies that have described LogP as an important parameter in drug 

3416 binding to P-gp (Lu et al., 2001; Matsson et al., 2009; Wang et al., 2003). The 

3417 significance of LogP in P-gp inhibition is due to the presence of several lipophilic 

3418 and aromatic residues in the binding sites of P-gp (Aller et al., 2009). BalabanJ is a 

3419 highly discriminating topological index which represents the extended connectivity 

3420 and the shape of molecules (Thakur et al., 2004) and has been shown to be related 

3421 to properties such as melting point and solubility (Ghafourian and Bozorgi, 2010). 

3422 This indicates the favourable interaction of certain molecular shapes with P-gp. 

3423 Nature of the substrate used for the measurement of IC 50 and log K; values has an 

3424 effect on the measured inhibitory activity, as can be seen from the division of 

3425 compounds in node 11 according to the substrate’s apparent distribution coefficient 

3426 at pH 2 (S-LogD(2), where S indicates the parameter refers to the substrate). 

3427 Substrates such as daunomycin and quinidine are basic in nature which will result 

3428 in very low distribution coefficient at pH 2 (LogD(2) < -1.265). According to the 

3429 RT model in Figure 5.5, such substrates will result in higher measured IC50 and log 

3430 K; for the inhibitors. Compounds of high lipophilicity (SlogP > 3.179) may be 
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3433 

3434 

3435 

3436 

3437 

3438 

3439 

3440 

3441 

3442 

3443 

3444 

3445 

3446 

3447 


potent P-gp inhibitors despite the lipophilic substrates if they contain a large 
hydrophobic volume at the highest hydrophobic interaction level (vsurf_D8) and a 
large surface area of non-polar atoms (PEOEVSAHYD), especially if they are 
not more lipophilic than SlogP threshold 5.587 (node 51). In node 13, if the 
lipophilic volume is not larger than 83.75, then compounds with many -CH2- 
groups (which may represent less branching) can be reasonable inhibitors (average 
log K; of 1.34). 


Tree graph for Log Ki 

Num. of non-terminal nodes: 10, Num. of terminal nodes: 11 
Model: C&RT 
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Figure 5.5. RT (2) developed using the training set with the descriptors selected by 
C&RT algorithm 


Figure 5.6 is the selected model developed by CHAID (1) method. Similar to 
C&RT method above, the hydrophobicity descriptor, SlogP, is the first (most 
important) descriptor in this CHAID (1) model. In this case compounds have been 
split into three nodes, with the most lipophilic drugs having the highest inhibition 
effect (node 4) and the least lipophilic compounds being the least potent or non- 
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3448 inhibitors (node 2). The non-inhibitors in node 2 have been partitioned further to 

3449 separate 7 compounds with an aromatic nitrogen group in the structure (SaaN_acnt) 

3450 as the least effective inhibitors with an average log K; of 2.78. Node 3 contains 

3451 compound with intermediate inhibitory activity and SlogP between 2.308 and 

3452 3.831. These compounds will be more potent if they contain a double bonding CH 

3453 group which is seen in compounds such as cyclosporine, valspodar, bromocriptine 

3454 and quinidine. The most hydrophobic compounds in node 4 are all considered to be 

3455 strong to moderate inhibitors of P-gp with the log K, in the terminal nodes ranging 

3456 from -1.46 to 1.60. In this group, compounds containing 3-membered rings (node 

3457 10) and non-lead-like molecules according to Oprea’s definition (Oprea, 2000) in 

3458 node 11 are strong P-gp inhibitors. This observation regarding the higher inhibitory 

3459 activity of non-lead-like compounds is in agreement with a recent study by Wang et 

3460 al where lead-like compounds had lower propensity to be P-gp substrates (Wang et 

3461 al., 2011). Among these inhibitors, those with fewer H-bond donor/ acceptor pairs 

3462 than two (SHBint4_Acnt) are less strong inhibitors (node 13). In node 13, 

3463 compounds containing a thioether group are exceptions with a relatively high 

3464 average log K; value of 1.60 (SssS_acnt). The remaining 44 compounds (node 17) 

3465 have high inhibitory activity towards P-gp. Oprea’s Lead-like compounds in node 

3466 12 may also have strong inhibitory activity towards P-gp if the probe substrate used 

3467 in the inhibition study is daunomycin. 
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Tree graph for Log Ki 

Num. of non-terminal nodes: 8, Num. of terminal nodes: 
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Figure 5.6. CHAID (1) developed using the training set 


Despite using P-gp/ inhibitor interaction energies from docking studies as one of 
the molecular descriptors, none of the decision tree algorithms above, C&RT and 
CFIAID (1), picked docking scores as a significant parameter for partitioning of the 
log K; data. This was explored further by using the docking scores in interactive 
tree, I-tree (3) model (Figure 5.7). In this analysis ‘Cross-validate tree sequence’ 
was used in addition to V-fold cross-validation to ensure the validity of each level 
of the tree for accurate prediction of log K; in both training and validation sets. 
Docking score was incorporated as the first variable for partitioning of the data and 
this was found statistically significant by the cross validations. Figure 5.7 shows 
that the statistically selected threshold for docking energy is -13.44 (kcal/mol). 
Inhibitors with docking energy below this value (node 2) will be more effective if 
they contain a low ratio of hydrophilic to total surface area (vsurf_CW4 < 0.539), 
particularly those with a higher distance between their local hydrophilic energy 
minima (vsurf_DW13). The tree shows that high docking energy compounds (> - 
13.44 kcal/mol) are weak inhibitors unless when the probe substrate used in K; 
measurement is hydrophilic (S-LogP < 0.850). 
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Tree graph for Log K, 

Num. of non-terminal nodes: 4, Num. of terminal nodes: 5 



Figure 5.7. I-tree (3) developed using docking energy as the first variable 


5.3.1.2.2. Significance of P-gp Docking Energies 

Docking is a very useful tool in computer-aided drug discovery due to the 
importance of shape-matching in drug-macromolecule interactions. It has been 
postulated that compounds with shape and chemistry similar to those of a known 
active molecule have a high probability of being active (Hawkins et al., 2007). On 
the other hand, the interaction energy can be notoriously misleading with large 
molecular weight compounds often achieving the most negative interaction 
energies, which is due to the additive nature of the energy formula (Schulz-Gasch 
and Stahl, 2004; Lipkowitz and Boyd, 2002). In our training set, the top ten 
molecules with the most negative interaction energies had an average molecular 
weight of 925 Da in comparison with 461 Da for the remaining compounds in the 
training set. On the other hand, these ten compounds had a lower average log K; of 
0.75 in comparison with 1.28 for the remaining compounds in the training set. 

The lack of flexibility of the target protein during docking should also be taken into 
consideration when assessing docking results. Docking experiments are most 
reliable when interaction between a rigid protein target and a flexible ligand is 
investigated (Davis and Teague, 1999). For docking results to successfully guide 
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3508 the predictions of inhibitors and substrates of P-gp, it should take into account the 

3509 very flexible nature of this transporter enzyme (Teague, 2003). Previous studies 

3510 have described the importance of protein flexibility in P-gp ligand interactions (Loo 

3511 et al., 2003; Loo el al., 2009). Induced fit mechanism explains the fact that both 

3512 drug and protein are flexible, and can modify their shape to generate more 

3513 favourable contacts (Alonso et al., 2006). Current evidence shows that P-gp is able 

3514 to accommodate a wide range of substrates due to the mobile nature of its 

3515 transmembrane helices (Loo et al, 2003; Ambudkar et al., 2003). From this 

3516 hypothesis, it is possible that compounds in the dataset may not be correctly 

3517 identified as substrates or inhibitors of P-gp, because the docking process does not 

3518 allow the protein to be mobile and therefore some compounds are not recognised as 

3519 substrates in the drug binding pocket. Moreover several different but overlapping 

3520 binding sites have been identified for P-gp (Aller et al., 2009). In this study we used 

3521 the binding site defined by the cyclic hexapeptide, QZ59-RRR, in the X-ray 

3522 structure of the protein reported by Aller and co-workers. 

3523 

3524 5.3.1.2.3. Ensemble Decision Trees 

3525 Studies have shown that an ensemble of several trees may result in better prediction 

3526 accuracy when there is a significant diversity among the models (Kuncheva and 

3527 Whitaker, 2003). In this investigation boosted trees and random forest were used. 

3528 Boosted trees method is an ensemble method that computes a sequence of simple 

3529 trees, each built for the prediction of residuals of the preceding tree. Various 

3530 combinations of subsample proportions and learning rates were examined and the 

3531 best model was selected based on the prediction error for the test set. The best result 

3532 was obtained with the subsample of 0.6 and learning rate of 0.05, using the 

3533 optimum number of trees of 161. The top ten most important descriptors as 

3534 calculated by STATISTICA software has been described in Table 5.3. The 

3535 categorical variable indicating the nature of the substrate was the most important 

3536 BT (3) descriptor, followed by hydrophobic volume (measured by Volsurf 

3537 descriptor) and polarity descriptors including total polar van der Waals surface area 

3538 and total positive and negative partial charges. 
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3539 Random Forest is another ensemble method which develops a number of decision 

3540 trees using a random selection of training set compounds and molecular descriptors. 

3541 The graph of average squared error against number of trees for training and cross- 

3542 validated test sets indicated that the test error reaches a plateau at around 50-60 

3543 trees. Therefore, the final RF model (RF (2)) containing 60 trees was used. In this 

3544 selected model, descriptors of molecular topology of the inhibitor such as distance 

3545 and adjacency matrix descriptors as well as lipophilicity indicators and Volsurf 

3546 molecular interaction descriptors were ranked as the most important descriptors. 

3547 Unlike the BT (3) model, here there was only one substrate descriptor amongst the 

3548 top 10 and that ranked as the 10 th most important molecular descriptor of the model. 

3549 

3550 5.3.1.2.4. MARS Model 

3551 Many combinations of molecular descriptors picked by several pre-processing 

3552 feature selection methods were used in MARS analysis to obtain the best possible 

3553 model as explained in the methods section. The feature selection methods included 

3554 Chi-square method, stepwise regression analysis, and variable importance rank 

3555 from random forest and boosted trees analyses. Previous investigations have shown 

3556 that predictor importance using random forest is a very successful feature selection 

3557 method that can be applied for reducing the data dimensionality prior to C&RT 

3558 analysis (Newby et al., 2013a). Here, the best MARS model (Mars (1)) was 

3559 obtained when the top 15 molecular descriptors from RF model together with the 

3560 top two substrate descriptors from BT model (S-logP and S-PSA) were given as the 

3561 independent variables. Subsequently, as a result of the pruning function in MARS 

3562 analysis, eight out of the 17 molecular descriptors were used in the selected model 

3563 (summarized in Table 5.4 below). The MARS (1) model in Table 5.4 consists of 11 

3564 basis functions with three descriptors employed in two basis functions each and 

3565 each of the remaining five descriptors are involved in one basis functions. This 

3566 model does not contain any interaction term. In this MARS model, molecular 

3567 descriptors have been presented according to the rank order of their importance, 

3568 with the most important descriptor being the first one in the equation. 
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3569 An interesting finding from the MARS (1) model in Table 5.4 is a knot at 5.29 for 

3570 octanol/water partition coefficient, logP(o/w); increasing the lipophilicity of the 

3571 inhibitors leads to a reduction in log K; values up to this point. On the other hand, 

3572 compounds with extremely high lipophilicity (logP(o/w) > 5.29) will have an 

3573 increased log K ; values (low potency) with increasing their lipophilicity. The 

3574 second and the third most important descriptors of the MARS model are substrate 

3575 properties, partition coefficient (S-logP) and polar surface area (S-PSA). Inhibitors 

3576 will appear less effective (higher measured log K, values) when the substrate is 

3577 more lipophilic at S-logP values higher than 2.14. Likewise, substrates of larger 

3578 polar surface area lead to increased log K; values. The molecular descriptor derived 

3579 from the adjacency matrix of the inhibitors (GCUT SMR 3) is the next most 

3580 important parameter of the model, which is involved in two basis functions. In this 

3581 molecular descriptor, the diagonal of the adjacency matrix takes atomic 

3582 contribution to molar refractivity. The basis functions indicate a positive 

3583 relationship between log K; and this molar refractivity indicator for compounds 

3584 with GCUT_SMR_3 > 3.30; while the opposite (a negative relationship) is 

3585 observed for compounds having lower molar refractivity indicator. In other words, 

3586 compounds with high molar refractivity are better inhibitors up to a certain 

3587 GCUT SMR 3 threshold. In agreement with this finding, a previous study on P-gp 

3588 substrates has also indicated a minimum required molar refractivity for the 

3589 classification of compounds into the substrate category (Demel et al., 2009), but a 

3590 maximum level of molar refractivity had not been specified. vsurf_D2 is a Volsurf 

3591 molecular descriptor (Cruciani et al., 2000a), indicating the hydrophobic part of the 

3592 molecular volume. For the minority compounds with vsurf_D2 < 493 (only 9 

3593 compounds), the smaller hydrophobic volumes leads to lower log K, values. 

3594 Another molecular descriptor indicating the hydrophobic size of the molecule, 

3595 (PEOEVSAHYD) has appeared in two basis functions with a knot at the 

3596 descriptor value of 465. For compound with PEOE VSA HYD above this 

3597 threshold value, there is a negative relation with log K, (the higher the hydrophobic 

3598 surface area the more effective the inhibitor). The similar trend, but with a much 

3599 higher gradient, is observed for compounds with PEOE VSA HYD < 465. The 

3600 second lowest hydrophilic energy (vsurf_EWmin2) (Cruciani et al., 2000a), has a 

3601 negative effect on log Kj, i.e. compounds are less effective inhibitors if the 

3602 minimum hydrophilic energy is lower than -8.64. This negative impact of a 
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hydrophilic interaction is only seen for the second hydrophilic region on the 
molecular surface (not for the first hydrophilic region). Finally, SMR VSA4 is 
surface area corresponding to atoms with an atomic contribution to molar 
refractivity of 0.39-0.44; these are mainly conjugated nitrogen atoms such as those 
in amide bonds. The MARS equation indicates that presence of more such groups 
will reduce the log IQ values (better inhibitory effect). 

Table 5.4. The selected MARS (1) model 


Log Ki = -0.452 + 0.388*max(0, logP(o/w) - 5.29) + 0.255*max(0, 5.29 - 
logP(o/w)) - 0.475*max(0, 2.14 - S-LogP) + 0.00463*max(0, S-PSA - 45.6) + 
3.06*max(0, GCUT SMR 3 - 3.30) + 0.938*max(0, 3.30 - GCUT SMR 3) - 
0.00684*max(0, 493 - vsurf_D2) - 0.00252*max(0, PEOE_VSA_HYD - 465) + 
0.00512*max(0, 465 - PEOE_VSA_HYD) + 0.492*max(0, -8.64 - 

vsurf_EWmin2) + 0.115*max(0, 3.19 - SMR_VSA4) 


N= 176 


GCV error = 0.548 


Mean residual = 0.000 


SD(residual) = 0.645 


5.3.1.2.5. Validation of Models 

All models were validated using an external validation set of 43 compounds. Table 
5.5 shows the error of the selected models for the prediction of log K; values of the 
external validation set and the training set. It can be seen that the RT (2) model 
gives the most accurate prediction of log K, followed by BT (3) and then MARS 
(1). For the training set, BT (3) calculates the most accurate log K, values followed 
by RT (2) and then the CHAID (1) model. The difference between model accuracy 
for training and validation sets may indicate the possibility of overfitting into 
training data. In this case, amongst the top three models listed above, MARS (1) 
has the lowest difference between the training and the validation set errors, while 
BT (3) has the highest difference. 
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3624 Table 5.5. The summary of the prediction accuracy of the K; values 


Model 

MAE for training set 

MAE for validation S6(25 

RT (2) 

0.398 

0.543 

3626 

CHAID (1) 

0.471 

0.603 

I-tree (3) 

0.690 

0.706 

3627 

BT (3) 

0.316 

0.568 

RF (2) 

0.501 

0.618 


MARS (1) 

0.487 

0.577 

3628 


3629 

3630 5.3.2. Prediction of Biliary Excretion Using Predicted P-gp Binding 

3631 Values 

3632 Predicted log K; by the six models reported in section 5.3.1 were used as 

3633 independent variables along with the molecular descriptors for the prediction of 

3634 biliary excretion (log BE%). These were log K; (RT ), log K ; (C haid), log K ; ( i_ tre e), log 

3635 K; ( bt), log K; ( RF ), and log K, ( M ars> Models for log BE% were developed using 

3636 stepwise regression analysis, C&RT, CHAID, boosted trees, random forest and 

3637 MARS. The results of these analyses have been summarised in Table 5.6. As it can 

3638 be seen in this table, none of the predicted log K; values were picked by C&RT, 

3639 CHAID, stepwise regression analysis (eight parameters), Chi square feature section, 

3640 MARS feature selection (based on GCV error) or the 20 most important features by 

3641 random forest, as a significant factor in the estimation of biliary excretion of 

3642 compounds; the exception to this was the selected BT model. As a result, the 

3643 multiple linear regression model was the same as MLR (1) (section 4.3.1), and 

3644 regression trees and random forest models were those reported in section 4.3 (RT 

3645 (1) and RF (1)). 

3646 
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3647 Table 5.6. Summary of model development for log BE% using molecular 

3648 descriptors and predicted log K; values 


Method 

Predicted log K, parameter picked 

Resulting Model 

Stepwise regression 

none 

MLR (1) 

C&RT 

none 

RT (1) 

RF 

none 

RF (1) 

CHAID 

none 

CHAID (2) 

BT 

Log Ki (MARS), 

Log K( (rf) 

BT (4) 

MARS 

none 

MARS (2) 

MARS 

Log Ki (rf) 

MARS (3) 


3649 

3650 In this study, in addition to the methods investigated in chapter 4, CHAID and 

3651 MARS methods were also used for model development. The resulting CHAID 

3652 model (CHAID (2) in Table 5.6) did not pick any predicted log K; parameter. This 

3653 CHAID model has been presented in Figure 5.8. 

3654 Figure 5.8 shows that hydrophilic volume (vsurf_W4) is the dominant variable of 

3655 this tree (node 1), with a binary classification. According to this model, compounds 

3656 with large hydrophilic volumes are excreted in higher quantities through bile. Other 

3657 descriptors of CHAID (2) show similar trend to C&RT models presented in Chapter 

3658 4 for biliary excretion. For example, hydrophilic compounds with higher acid/base 

3659 ionisation have higher biliary excretion (node 6), especially if they are non-lead like 

3660 (node 12). Even compounds with small hydrophilic volumes can have considerable 

3661 biliry excretion if they are non-lead like (node 4). The high biliary excretion of non- 

3662 lead-like compounds is in agreement with the results in section 5.3.1 that indicated 

3663 non lead-like compounds to be suitable P-gp substrates, thereby aiding their biliary 

3664 excretion by the efflux system. The prediction accuracy of CHAID (2) model is 

3665 reasonably good (see Table 5.7). The risk estimate and standard error are 0.322 for 

3666 training set and 0.254 for the validation set. 

3667 
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CHAID graph for log BE% 

Num. of non-terminal nodes: 7, Num. of terminal nodes: 8 
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Figure 5.8. CFIAID (2) Developed using the training set with the descriptors 
selected by CFIAID algorithm 


Table 5.7. Error of biliary excretion (log BE%) prediction by the selected models 


Model 

MAE for training set 

MAE for validation 36(73 

BT (4) 

0.339 

0.416 

CHAID (2) 

0.432 

0.359 3674 

MARS (2) 

0.438 

0.428 

MARS (3) 

0.436 

0.442 3675 


As seen in Table 5.6, log K ; predicted by MARS (1) and RF (2) (log K i(M ARS) and 
log Kj (rf)) models were two of the most important features in the boosted trees 
analysis for the prediction of biliary excretion. The selected BT model (BT (4)) has 
similar prediction accuracy to the BT models without P-gp information (compare 
BT (1) and BT (2) models in Table 4.5 with BT (4) in Table 5.7). Lipophilicity 
parameters (LogD (6.5), LogD (7.4)), shape indexes (Kier2, Kier3 and Kier A3) 


147 









































































3683 

3684 

3685 

3686 

3687 

3688 

3689 

3690 

3691 

3692 

3693 

3694 

3695 

3696 

3697 

3698 

3699 

3700 


and Volsurf descriptors indicating hydrophilic ratio (vsurf_CW2 and vsurf_CW4) 
were amongst the top 15 descriptors of BT (4) model. The optimal number of trees 
in this graph was 156 (Figure 5.9). Statistical parameters of this boosted tree are 
reported in Table 5.7. 


Summary of Boosted Trees 
Response: log%DoseExcre 


Optimal number of trees: 156, Maximum tree size: 3 



— Train data 

— Test data 

— Optimal number 


Figure 5.9. Average squared error of log BE% against the number of trees in the 
boosted trees model BT (4) for the training and internal test sets 


MARS models were developed using a number of descriptor sets as explained in 
the methods section. The best MARS model was MARS (2) using the features 
selected by Chi square feature method (Table 5.8). The second best model was 
MARS (3) in which, in addition to Chi square feature predictors, the predicted log 
K; values (from RF model) were also used as independent variables. According to 
MARS (2) and (3), increasing the number of sulphur atoms up to two will increase 
biliary excretion, with no further increase observed with more sulphur atoms. All 
the remaining molecular descriptors of MARS (2) are volsurf descriptors of 
hydrophilic volume and hydrogen bond donor capacity measured at different 
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energy levels. MARS (3) equation in Table 5.9 indicates that weaker P-gp binders 
(compounds with higher predicted log K, values) will have reduced the log BE%. In 
MARS (3), in addition to the Volsurf (vsurf) variables similar to MARS (2), 
lipinski’s lead-like compounds have been indicated to have lower biliary excretion 
which is a similar pattern to that observed with P-gp binding. 


Table 5.8. The selected MARS (2) model (Feature selection) 


Log BE% = -3.14 + 4.99*max(0, vsurf_HB3-8.58) - 3.74*max(0, 9.12-vsurf_W2) 
+ 1.63*max(0, vsurf_W4-1.49) + 3.21*max(0, vsurf_W2-1.24) - 1.99*max(0, 2.00- 
a_nS) - 1.17*max(0, vsurf_W3-8.07) + 8.547*max(0, 8.07-vsurf_W3) - 

1.14*max(0, vsurf_HB4-1.96) 


N= 168 


GCV error = 0.398 


Mean residual = 0.000 


SD(residual) = 0.573 


Table 5.9. The selected MARS (3) model (Feature selection and RF predictor) 


Log BE% = 8.270- 1.240 (0, vsurf_HB4-2.67) + 2.867*max(0, vsurf_HB3-8.58) + 
5.52*max(0, 8.58-vsurf_HB3) - 3.98*max(0, vsurf_W2-9.12) + 6.88*max(0, 
vsurf_W4-1.49) + 3.33*max(0, vsurf_W2-1.24) - 1.59*max(0, 2.00-a_nS) - 
5.70*max(0, log K ; (RF )-1.90) - 3.66*max(0, lip_druglike-0.00) 


N= 168 


GCV error = 0.397 


Mean residual = 0.000 


SD(residual) = 0.565 


5.4. Discussion 

5.4.1. Structural Determinants of Potent P-gp Inhibitors 

Inhibitors of P-gp can be competitive inhibitors that may bind to the substrate 
binding site, or non-competitive which may bind to other distinct binding sites such 
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3717 as the ATP-binding site. An investigation that involved docking of multispecific 

3718 inhibitors into the ATP-binding domain of P-gp has shown that some of the less 

3719 lipophilic inhibitors can bind to this site, which may contribute to their inhibitory 

3720 activity (Neuhoff et al., 2000). On the other hand, the more common, lipophilic 

3721 inhibitors do not interact with the ATP-binding domain of P-gp. Inhibitors from 

3722 steroid and flavonoid chemotype are examples that may bind to the ATP-binding 

3723 site (Conseil et al., 1998; Broccatelli et al., 2011). The inhibitors in the training set 

3724 in this study did not contain any flavonoids but did contain five steroid structures, 

3725 testosterone, progesterone, spironolactone, digoxin and cortisol. These steroids are 

3726 also expected to bind to the substrate binding site. For example, studies for several 

3727 sex-steroid hormones have shown that these are substrates of P-gp mediated 

3728 transport as well as being a P-gp enzyme inducer (Kim and Benet, 2004) and 

3729 digoxin is also a known substrate of P-gp as well as acting as an inhibitor (de 

3730 Lannoy and Silverman, 1992). 

3731 From the description of the models outlined above, it can be seen that lipophilicity 

3732 is the key factor for P-gp inhibition along with the molecular topology and the size 

3733 of the inhibitors as well as the nature of the substrate probe. In tenns of the 

3734 lipophilicity, a higher partition coefficient than what is recommended for drug-like 

3735 molecules (based on Lipinski or Oprea’s rules) seems to improve the inhibitory 

3736 activity towards P-gp. According to the best model (RT), the ideal lipophilicity is 

3737 SlogP value in the range (3.179, 5.587]. A similar pattern can be observed in 

3738 MARS model where a lipophilicity threshold of 5.29 has been indicated. Previous 

3739 studies using classification models have found a higher lipophilicity (log P) for 

3740 multispecific inhibitors of P-gp in comparison with non-inhibitors (Broccatelli et 

3741 al, 2011; Matsson et al., 2009), although these studies have not specified a 

3742 maximum lipophilicity threshold. For P-gp substrates, an even higher lipophilicity 

3743 requirement has been reported in an investigation using a large set of proprietary 

3744 GSK compounds, i.e. a log P > 4 for the substrate class (Gleeson, 2008). 

3745 In addition to the partition coefficient, other lipophilicity measures, which also 

3746 indicate the size of the lipophilic regions, are found to have an impact. A large 

3747 hydrophobic volume (vsurf_D8) (Cruciani et al, 2000a), in the RT model and a 

3748 large hydrophobic surface area (PEOE VSA HYD) in MARS and RT models 
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3749 improve potency of the inhibitors. These two parameters are indicators of both size 

3750 and lipophilicity. The positive impact of large molecular size and lipophilicity is in 

3751 agreement with the known structure of P-gp and its proposed substrate binding 

3752 pocket where the large binding site of P-gp consists of a considerable number of 

3753 lipophilic amino acids (Song et al., 2010). The descriptor PEOEVSAHYD has 

3754 also been used by Demel et al for the classification of substrates/nonsubstrates, 

3755 which indicates compounds with PEOE VSA HYD > 300, log P < 7 and hydrogen 

3756 bond acceptor groups more than seven are substrates of P-gp (Demel et al., 2009). 

3757 Lipophilicity and molecular size have also been indicated in local QSAR models 

3758 for individual classes of modulators/ substrates (Wang et al., 2003). 

3759 In addition, the higher inhibitory activity of non-lead-like compounds (based on 

3760 Oprea’s definition) in CHAID model (CHAID (1)) may also indicate the positive 

3761 effect of high molecular size and higher lipophilicity than lead-like molecules. 

3762 Compounds that accommodate the opera test are defined as compounds with 

3763 molecular weight ^ 460 Da, -4 ^ Log P A 4.2, Log Sw A -5, number of 

3764 rotatable bonds =7= 10, number of rings ^ 4, number of hydrogen donors ^ 5, and 

3765 number of hydrogen acceptors Vi 9 (Oprea, 2000). According to this CHAID 

3766 model, compounds that violate more than two of the above rules are better 

3767 inhibitors of P-gp. A close observation of such compounds indicates higher 

3768 lipophilicity or hydrogen bonding groups, as well as higher molecular size and 

3769 number of rings are the reason for the violations that results in compounds being 

3770 potent inhibitors. Examples are paclitaxel, nicardipine and vinblastine. 

3771 Other significant molecular determinant of P-gp inhibitors is the molecular 

3772 topology and shape as described by the adjacency and distance matrix descriptors 

3773 such as the connectivity index BalabanJ in the RT (2), GCUT descriptors in the 

3774 MARS model and VDistMa in the BT (3). Broccatelli and co-workers (Broccatelli 

3775 et al., 2011) have also hypothesised that an optimal shape may exist for P-gp 

3776 inhibitors, but the optimal shape needs to have adequate lipophilicity and H-bond 

3777 acceptor ability. H-bond acceptor ability has also been emphasised by Demel et al 

3778 (Demel et al., 2009) which show the importance of a high number or a large surface 

3779 area of H-bond acceptor groups. In the models presented in this study, the effect of 

3780 H-bond can be seen in the CHAID (1) where compounds containing more than 2 
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3781 internal H-bonding are more effective inhibitors. MARS model also indicate the 

3782 positive impact of presence of conjugated nitrogen groups (e.g. amides). A number 

3783 of molecular descriptors which may indicate H-bonding effect are present in RF 

3784 and BT models, including negative charge weighted surface area (CASA-) and 

3785 partial charge descriptors which are indicators of H-bonding (Dearden and 

3786 Ghafourian, 1999). It must be noted that these parameters as well as the parameters 

3787 of Demel et al. may also relate to the molecular size as larger molecules are more 

3788 likely to contain many H-bond groups. 

3789 

3790 5.4.2. Effect of Substrate on the K; Measured for the Inhibitors 

3791 According to Harvey and Ferrier (2011), any substance that can diminish the 

3792 velocity of an enzyme catalysed reaction is called an inhibitor. As stated earlier in 

3793 this chapter, the two most universally types of reversible inhibition are competitive 

3794 and non-competitive (Harvey and Ferrier, 2011). In competitive type, inhibitor 

3795 binds reversibly to the same active site (with non-covalent bonds) that the substrate 

3796 would normally occupy and thus competes with substrate for that site. A 

3797 competitive inhibitor will increase the apparent Km for a given substrate, but the 

3798 Vmax does not change (Harvey and Ferrier, 2011). The non-competitive inhibitors 

3799 bind non-covalently to a site rather than the active site and change the confonnation 

3800 of the enzyme. Unlike the competitive inhibitors, non-competitive inhibitors cannot 

3801 be overcome by increasing the concentration of substrate and therefore, these 

3802 inhibitors decrease the apparent Vmax of the reaction. Also, non-competitive 

3803 inhibitors do not interfere with the binding of substrate to enzyme, hence the 

3804 enzyme shows the same Km in the presence or absence of the non-competitive 

3805 inhibitor (Harvey and Ferrier, 2011). 

3806 It has been suggested that there are several binding sites for the molecularly diverse 

3807 spectrum of P-gp substrates, inhibitors and modulators. For example, using 

3808 equilibrium and kinetic radioligand binding assays, Martin et al established the 

3809 presence of at least four distinct interaction sites on P-gp which were able to 

3810 communicate allosterically (Martin et al., 2000). Moreover, various competitive, 

3811 cooperative allosteric and anticooperative allosteric interactions are possible 
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3812 between the substrates and the regulators (Lu et al., 2001). As a result, the 

3813 inhibitory activity measured using different substrates will be different for the same 

3814 inhibitor (Rautio et al., 2006). The x-ray structure of mouse P-gp with 87% 

3815 sequence identity to human P-gp has recently been described (Aller et al., 2009). It 

3816 was found that P-gp can distinguish between different 3D shapes, and that 

3817 stereoisomers may bind to different binding locations. Given the complexity of the 

3818 binding locations and modes of inhibition, it has been suggested that a single 

3819 pharmacophore cannot effectively describe the inhibitors of various P-gp substrates, 

3820 and therefore, for the inhibition of the transport of different P-gp substrates 

3821 different pharmacophores have been proposed (Ekins and Erickson, 2002). The 

3822 modelling strategy in this investigation should be able to deal with the diversity of 

3823 the binding sites. In particular, molecular descriptors of the substrates were 

3824 incorporated in the model development in addition to molecular descriptors of 

3825 inhibitors. Moreover, a categorical variable was implemented in all the decision 

3826 tree models and ensembles. Regression tree is a powerful data mining tool that is 

3827 able to select the important features for dividing the data into high or low activity 

3828 groups (distinct groups of compounds with high or low average log K, values). The 

3829 models described above indicate the importance of substrate in the measured 

3830 inhibitory activity as all the models contain at least one substrate descriptor selected 

3831 by the feature selection methods. 

3832 The average prediction error separately for the inhibitors of different substrates has 

3833 been calculated. Table 5.10 gives the average error of log K ; prediction for 

3834 inhibitors of different substrates using the selected models. The table shows that in 

3835 average, models predict the inhibitory activity of calcein substrates with the highest 

3836 accuracy. The ra nk order of the average prediction error (for the external validation 

3837 set) from the lowest to the highest is for the inhibitors of calcein, digoxin, 

3838 vinblastine, daunomycin, irinotecan and quinidine as the probe substrates. The 

3839 lower average error for a specific substrate’s inhibitors may be associated with the 

3840 number of inhibitors of that substrate in the training set, an indication of which is 

3841 the number in the training and validation set shown in Table 5.10. 

3842 Table 5.10. Number of inhibitors of different substrates and MAE of log Kj 

3843 prediction for the training and validation set 
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Substrate 

n 

RT (2) 

CHAID (1) 

I-tree (3) 

RF (2) 

BT (3) 

MARS (1) 

Training Set 

Abacavir 

1 

0.845 

0.755 

0.866 

0.366 

0.689 

0.647 

Calcein 

49 

0.347 

0.550 

0.579 

0.344 

0.378 

0.311 

Daunomycin 

18 

0.519 

0.628 

0.826 

0.733 

0.598 

0.539 

Digoxin 

71 

0.534 

0.766 

0.591 

0.576 

0.629 

0.549 

Fexofenadine 

1 

0.764 

0.839 

0.625 

0.597 

0.521 

0.916 

Irinotecan 

1 

0.897 

0.784 

0.894 

1.180 

1.113 

1.238 

Prazosin 

8 

0.597 

0.637 

0.593 

0.927 

0.493 

0.472 

Quinidine 

2 

1.001 

0.091 

1.257 

1.298 

1.279 

1.634 

Vinblastine 

25 

0.492 

0.759 

0.563 

0.923 

0.410 

0.464 

Validation Set 

Calcein 

14 

0.388 

0.556 

0.609 

0.356 

0.380 

0.300 

Daunomycin 

4 

0.668 

0.658 

0.869 

0.850 

0.735 

0.735 

Digoxin 

18 

0.574 

0.985 

0.754 

0.601 

0.634 

0.611 

Irinotecan 

2 

1.005 

0.726 

1.223 

1.365 

1.418 

1.517 

Quinidine 

1 

1.270 

0.033 

1.809 

1.771 

1.249 

2.582 

Vinblastine 

5 

0.668 

0.934 

0.866 

0.696 

0.435 

0.559 


3844 

3845 

3846 5.4.3. Effect of P-gp Binding on Biliary Excretion Models 

3847 It can be seen from the results that the use of predicted P-gp binding values did not 

3848 lead to improved models for biliary excretion, and Log K ; was selected only by the 

3849 BT (4) and MARS (3) models. However, similarities can be observed between 

3850 molecular detenninants of P-gp binding and biliary excretion. For example, Oprea’s 

3851 lead-like compounds have lower P-gp binding (as seen in CHAID (1) in Figure 5.6) 

3852 as well as having lower biliary excretion (CHAID (2) in Figure 5.8). Also, 

3853 Lipinski’s drug like compounds with a similar definition to Oprea’s rule show 

3854 lower biliary excretion according to and MARS (3) in Table 5.9. This may relate to 

3855 larger MWs observed for both the prominent substrates of P-gp and cholephilic 

3856 compounds. However, there are also differences in structural requirements for these 

3857 two biological properties. Lipophilicity is a major contributor to P-gp binding 

3858 (Gleeson, 2008), which requires even higher log P than drug-like molecules as seen 

3859 from MARS (1), CHAID (1) and RT (2) models in section 5.3.1. The effect of 

3860 lipophilicity on biliary excretion is different with large hydrophilic molecules being 
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3861 more prone to biliary excretion as lipophilic compounds go through metabolism 

3862 instead (Sharifi and Ghafourian, 2014). This result should not be considered as 

3863 contradictory, as metabolism and biliary excretion are simultaneous processes in 

3864 hepatocytes and the overall effect is determined by the kinetics. It may be 

3865 speculated that large lipophilic compounds would be able to be excreted through 

3866 bile if their metabolism was limited/ slowed down. 

3867 In analysing the effect of P-gp binding on the observed in vivo biliary excretion 

3868 levels of compounds one should also consider the fact that P-gp binding data has 

3869 been obtained from in vitro experimentations using different cell cultures. This 

3870 model may not realistically represent the in vivo situation with healthy hepatocytes 

3871 in their natural liver environments. Moreover, P-gp is only one of the several efflux 

3872 pumps that operate in hepatocytes. 

3873 One possible reason for the ‘predicted P-gp binding’ not being selected by several 

3874 feature selection methods could be the poor prediction of P-gp binding for the 

3875 external (biliary excretion) dataset. Although the prediction accuracy for the 

3876 external validation set in P-gp binding QSARs have been tested to be satisfactory 

3877 (Table 5.5), the accuracy of prediction of P-gp binding for biliary excretion cannot 

3878 be assessed as the experimental values are not available for this dataset. The poor 

3879 prediction accuracy may happen if the diversity of compounds is different between 

3880 the two datasets, which may result in the biliary excretion dataset to fall outside the 

3881 applicability domain of P-gp models. According to Netzeva et al (2005) an 

3882 applicability domain need to be defined for QSAR models when using for external 

3883 predictions. In order to investigate this, principle component analysis (PCA) was 

3884 performed using all the molecular descriptors. Figure 5.10 show the scores plot of 

3885 PCI against PC2. It can be seen in the figure that despite a very good overlap, there 

3886 are many compounds in BE dataset on the left hand side of the figure which are 

3887 outside the range of, and further away from, the P-gp dataset. 

3888 
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Figure 5.10. Scores plot indicating biliary excretion dataset (BE) and the P-gp 
binding dataset (P-gp) 


5.5. Conclusion 

In order to develop accurate models for the P-gp inhibition, this study used K, 
values of large set of P-gp inhibitors calculated from the reported IC 5 o and the 
probe substrate’s Km and concentration values from the literature using Cheng and 
Prusoffs equation. In comparison with IC 50 , this parameter allows a better 
comparison between inhibitory activities measured using different probe substrates 
and substrate concentrations. In addition to the molecular descriptors of the 
inhibitors, this QSAR study also incorporated the molecular descriptors calculated 
for the probe substrate as the nature of the substrate used in the experiment may 
affect the inhibitory activity of the inhibitor. 

The study resulted in a few predictive models for estimating inhibition constant 
based on the accuracy of the prediction for the external validation set. The results 
indicated that substrate parameters were important for the prediction of the 
inhibitory activity as all feature selection procedures selected at least one substrate 
molecular descriptor in addition to the molecular descriptors of the inhibitors. This 
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3908 study also showed that docking scores are not good predictors of inhibitory activity. 

3909 When used as a molecular descriptor, docking scores were not selected by any of 

3910 the feature selection methods described here. When docking scores were 

3911 incorporated manually in C&RT analysis, the resulting regression tree had a high 

3912 error for the prediction of the validation set. The most significant models indicated 

3913 a higher lipophilicity of the potent inhibitors than lead-like compounds. The potent 

3914 inhibitors contained a high molecular weight, a high number/surface area/volume of 

3915 hydrophobic groups and conjugated nitrogen groups (e.g. amides). 

3916 The best model was a regression tree that was obtained using C&RT analysis. A 

3917 boosted trees model was the second best followed by a MARS equation. Both the 

3918 regression tree and the MARS model are simple and interpretable and the statistical 

3919 parameters indicate that they have a lower chance of overfitting in comparison to 

3920 the boosted trees model. 

3921 When the P-gp models were used for the prediction of P-gp binding for the 

3922 compounds in the biliary excretion dataset, the predicted log K, values were not 

3923 picked by several feature selection methods, or when picked (boosted trees and 

3924 MARS methods), the accuracy of the resulting biliary excretion models were not 

3925 improved (compare BT (4) with BT (1) and BT (2) or MARS (2) with MARS (3)). 

3926 This may be attributed to a number of factors including: 1) P-gp is only one of the 

3927 several efflux pumps operating in hepatocytes, and 2) the poor similarity between 

3928 the diversity of compounds in the dataset used for P-gp binding models and the 

3929 biliary excretion dataset may have led to poor prediction of K; values for 

3930 compounds in biliary excretion dataset. 


157 



3931 6. Inhibitory Effect of OATPs in Biliary Excretion 

3932 

3933 6.1. Introduction 

3934 Several members of the organic anion transporting polypeptide (OATP) family 

3935 have been shown to be specifically expressed in the liver and facilitate the liver 

3936 uptake of their substrate drugs. Mechanistic studies suggest an important role for 

3937 OATP family in the uptake of compounds from blood to hepatocyte, across the 

3938 basolateral (sinusoidal) membrane (Yamazaki et al, 1996). After transporting the 

3939 compounds into hepatocytes, these compounds are either metabolised or secreted 

3940 into the bile using ATP-dependent transporter proteins such as P-gp and MRP2 

3941 (Ayrton and Morgan, 2001). In fact, uptake by OATP transporters has often been 

3942 regarded as the single most important uptake mechanism involved in biliary 

3943 excretion (Fenner et al, 2012; Varma et al., 2012). For example, studies on lipid- 

3944 lowering drugs have shown that inhibition of OATP IB 1 hepatic uptake can 

3945 considerably increase statin concentration in blood after administration of 

3946 cyclosporine, a potent inhibitor of various OATPs (Shitara et al., 2003; Ho et al, 

3947 2006), and similar results have been obtained later by Neuvonen and co-workers for 

3948 other statins (Neuvonen el al., 2006). 

3949 Through their role in biliary excretion, OATPs also contribute to drug-drug 

3950 interaction events (Koenen et al., 2011). As mentioned above, cyclosporin is a 

3951 potent inhibitor of OATPs (in particular OATP2B1 and OATP1B1) and it is, at the 

3952 same time, a substrate of CYP3A4, thereby functioning as a competitive inhibitor 

3953 resulting in increased exposure of other CYP3A4 substrates (Wacher et al, 1998). 

3954 In addition, this compound interacts with P-gp (Foxwell et al, 1989) and MRP2 

3955 (Tang et al, 2002a). These efflux pumps are expressed in the canalicular membrane 

3956 of hepatocytes. As a result of all these enzyme and transporter interactions, this 

3957 drug has an impact on the biliary elimination of substrate compounds. Due to the 

3958 importance of transporters in drug-drug interactions, recently, in drug evaluation 

3959 process, the identification and kinetic characterization of OATP ligands early on 

3960 has become important for successful drug development (De Bruyn et al, 2013). 
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3961 Unfortunately, studies on OATP are limited due to the lack of very specific 

3962 inhibitor/substrates for this family of transporters. For example, in sinusoidal 

3963 hepatocyte membrane, apart from OATP IB 1 and OATP1B3 which are expressed 

3964 abundantly, OATP1A2 is also localized in a smaller quantity. All of these three 

3965 transporters are able to uptake pitavastatin in human hepatocyte. To elucidate which 

3966 OATP is actually responsible for the pitavastatin uptake, Hirano and colleagues 

3967 investigated the relative contribution of OATP IB 1 to the hepatic uptake of 

3968 pitavastatin. This was done by inhibition of hepatic uptake of pitavastatin by using 

3969 estradiol- 17(3-D-glucuronide as an OATP1B1/OATP1B3 inhibitor and estrone-3- 

3970 sulphate as an OATP1B1/OATP2B1 inhibitor, and comparing their results. The 

3971 study supported the idea that OATP IB 1 is the predominant transporter for the 

3972 hepatic uptake of pitavastatin (Hirano et al., 2006). 

3973 The lack of an X-ray crystal structure is a further limitation with OATP research in 

3974 the design of the specific modulators. For example, ligand-enzyme docking requires 

3975 an accurate high-resolution structure of the protein (Rognan, 2013). In a recent 

3976 investigation, a high-throughput in vitro transporter inhibition assay was reported 

3977 for the OATP1B subfamily (De Bruyn et al, 2013). This approach was able to 

3978 identify 212 and 139 molecules as inhibitors of OATP1B1 and OATP1B3. 

3979 Many OATPs share common substrates. OATP substrates are relatively large from 

3980 334 Da in benzylpenicillin to 1143 Da in cholecystokinin octapeptide, in terms of 

3981 the currently known substrates. Structural templates of many OATP substrates are 

3982 steroidal or peptidic (You and Morris 2007). The substrate specificity of OATP1B1 

3983 is similar to OATP1B3 and both transport a varied range of compounds including 

3984 bile acids, conjugates of sulphate and glucuronate, steroid conjugates, thyroid 

3985 hormones, peptides and amphiphilic organic drugs (Glaeser and Kim, 2006; 

3986 Leuthold et al., 2009; Hagenbuch and Meier, 2003; Tirana et al., 2001; Hsiang et, 

3987 1999; Konig et al., 2000a). Many solutes transported by OATPs are negatively 

3988 charged, however there are several examples of neutral (e.g. digoxin) and cationic 

3989 (e.g. N-methylquinidine) substrates. Several OATP substrates are promiscuous but 

3990 there are also some selective substrates. For example, the cholecystokinin 

3991 octapeptide is a selective OATP1B3 substrate (Nozawa et al., 2003). 

3992 
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3993 The aim of this investigation was to incorporate infonnation from OATP binding in 

3994 order to improve accuracy of the predicted biliary excretion. This work was carried 

3995 out in two stages: 1) developing the predictive models for OATP inhibition; and 2) 

3996 using the models for the prediction of OATP effect for the compounds in biliary 

3997 excretion dataset. OATP models consisted of both regression type (continuous) 

3998 models and classification type models. Unfortunately, there is a lack of sufficient 

3999 quantitative data on OATP substrates and non substrates (especially for OATP1B3 

4000 and OATP2B1). In a recent study, Vanna et al (2012) compared the chemical space 

4001 of a list of OATP substrates with that of cholephilic compounds. This study suffers 

4002 from a lack of non-substrate compounds that limits any quantitative conclusion. 

4003 Karlgren and co-workers (2012a) have recently published a relatively large dataset 

4004 of OATP inhibition effect measured using high-throughput methods. The measured 

4005 values are percentage inhibition of a probe substrate’s uptake by a large set of 

4006 compounds. It is noted that a single-point inhibition measure (percentage 

4007 inhibition) that uses only one inhibitor concentration is not as reliable as IC 50 for 

4008 measuring the inhibition activity. Moreover, direct kinetics measures for the 

4009 substrates would have been the ideal parameter for this investigation. Despite this, 

4010 considering that most enzyme inhibitors are usually also the substrates of the same 

4011 enzyme (competitive inhibition), this percentage inhibition dataset was used in this 

4012 investigation. The single point inhibition assays have proven useful in the past for 

4013 fast screening of compound activity and selectivity. An example is comparable 

4014 accuracy of models based on single point CYP inhibition measures, with those built 

4015 from IC 50 data (Carlson and Fisher, 2008). 

4016 

4017 6.2. Methods 

4018 6.2.1. Dataset 

4019 The dataset of 225 compounds collated, or experimentally detennined, by Karlgren 

4020 and co-workers (2012a) were used in this study. The OATP subfamilies, 

4021 OATP IB 1, OATP1B3 and OATP2B1 were included in the dataset. A total of 142 

4022 compounds in this dataset was from an earlier investigation (Karlgren et al 2012b), 

4023 which was then expanded to include compounds known to interact with OATPs or 
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4024 CYP enzymes (Karlgren et al, 2012a). The compounds were from the chemical 

4025 space of oral drugs (Karlgren et al, 2012a). Data consisted of percentage OATP 

4026 inhibition by the compounds. 

4027 The experimental measurements were perfonned using the human embryotic 

4028 kidney 293 (HEK293) cells stably transfected with OATP IB 1, OATP1B3 or 

4029 OATP2B1. In the screening experiments to measure interaction of the 225 

4030 compounds with each individual OATP, a concentration of 20 pM of the 

4031 compounds was used. The substrates used in the inhibition studies were estradiol- 

4032 17[3-glucuronide for OATP1B1 and OATP1B3, and estrone-3-sulfate for 

4033 OATP2B1. The substrate concentration was 0.52 pM in the inhibition of OATP IB 1 

4034 mediated estradiol-17P-glucuronide uptake. In the inhibition of OATP1B3 

4035 mediated estradiol-17P-glucuronide uptake, the substrate concentration was 

4036 1.04 pM and in the inhibition of OATP2B1 mediated estrone-3-sulfate uptake, the 

4037 substrate concentration was 1.02 pM. 

4038 The PC A of the dataset indicates that compounds are well distributed in the oral 

4039 drug space with 95% confidence interval. The dataset included 43% neutral 

4040 compounds, 29% negatively charged, 22% positively charged and 6% zwitterionic 

4041 compounds at pH 7.4 (Karlgren et al, 2012a). 

4042 For development of QSAR models for OATP interaction, both classification and 

4043 prediction (regression based) methods were used. The continuous (numerical) 

4044 percentage inhibition data were used for regression based analyses. For 

4045 classification methods, compounds were considered as inhibitors if they 

4046 significantly decreased the uptake of the substrate by at least 50%. In this case, 78 

4047 compounds (out of 225 compounds) were OATP IB 1 inhibitors, while 46 and 45 

4048 compounds (out of 225) were OATP1B3 and OATP2B1 inhibitors, respectively. In 

4049 the dataset, a few compounds stimulated OATP mediated transporter (instead of 

4050 inhibition). Clotrimazole, fendiline, progesterone and testosterone are the example 

4051 of stimulators (Karlgren et al, 2012a). In this investigation all such compounds 

4052 were considered as non-inhibitors in classification studies. 

4053 A total of 387 2D and 3D molecular descriptors were calculated for OATP dataset 

4054 using the same methods and software as explained in Chapter 4. 
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4055 

4056 6.2.2. QSAR Model Development and Validation 

4057 6.2.2.1. OATP Models 

4058 Both regression-based and classification models were developed for OATP 

4059 interaction. The regression based models were linear and non-linear methods of 

4060 stepwise regression analysis, C&RT, BT, RF and MARS. The classification method 

4061 was C&RT. All statistical analyses were perfonned using STATISTICA Data 

4062 Miner vl 1 (StatSoft Ltd.). 

4063 The compounds were divided into external validation set and training data. Models 

4064 were developed using training set compounds and assessed using external 

4065 validation sets. To divide the compounds, they were ordered according to their 

4066 inhibition percentage and from every set of five compounds, four were allocated 

4067 into the training and one into the external validation set by random. In this way, 

4068 training data consisted of 180 compounds and external validation set consisted of 

4069 45 compounds. For the analytical methods that required parameter optimization, a 

4070 fraction of training set compounds were randomly assigned into internal validation 

4071 set, or alternatively cross validation was used if the option was available in the 

4072 statistical software. For the internal validation set, where applicable, the risk 

4073 estimate and standard error were calculated in STATISTICA software and used as 

4074 the performance indicators. 

4075 In OATP modelling using boosted trees, the default values for learning rate, the 

4076 number of additive tenns (number of trees), random test data proportion (fraction of 

4077 data points in testing pool) and subsample proportion were 0.1, 200, 0.2 and 0.5, 

4078 respectively. In addition to the default values, various subsample proportions of 0.4, 

4079 0.45, 0.50, 0.55 and 0.60 were examined in combination with the learning rates of 

4080 0.1 and 0.05. The best OATP models were selected based on the performance 

4081 indicators for the internal validation set. 
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4082 6.2.2.2 Biliary Excretion Models 

4083 QSAR models were developed for biliary excretion using the dataset and methods 

4084 explained in Chapter 4. In addition to the molecular descriptors, the OATP effects 

4085 predicted by the selected models from section 6.2.2.1 were used as the independent 

4086 variables of the analyses. To this end, the selected OATP models from section 

4087 6.2.2.1 were used to predict OATP interaction (percentage inhibition values or 

4088 inhibitor/non-inhibitor classes) for the compounds in biliary excretion dataset (n = 

4089 217). In addition to C&RT method, interactive C&RT was used in which the 

4090 predicted OATP effects were manually incorporated in the models, when they were 

4091 not picked by C&RT feature selection automatically. 

4092 

4093 6.3. Results 

4094 It has been cited in the literature that presence of OATPs in the hepatocytes may 

4095 indicate their significance in biliary excretion process (Matsushima et al., 2005; 

4096 Pfeifer et al., 2014; Shitara et al., 2013). Binding of 225 compounds to three major 

4097 sub-family members of hepatic organic anion transporting polypeptides (OATP 

4098 transporters) were available for this analysis. These sub-families were OATP IB 1, 

4099 OATP1B3 and OATP2B1. The ratios of inhibitors to non-inhibitors were different 

4100 for each of these three proteins, as can be seen in Table 6.1. A total of 387 

4101 molecular descriptors were used for the QSAR model development for the training 

4102 set consisting 180 compounds. The method of data allocation into training and test 

4103 sets outlined in the methods section ensured that these sets contained similar ranges 

4104 of percentage inhibition values. The lipophilicity (LogP by ACD software) was 

4105 between -4.73 and 8.51 for the training set, and -3.26 and 7.28 for the validation set 

4106 with similar mean values of 2.43 and 2.58 respectively. Molecular weights of the 

4107 compounds were between 129-1214 Da for the training set and 94-1202 Da for the 

4108 validation set, with mean values of 405 and 392 respectively. 

4109 
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4110 Table 6.1. Number of inhibitor/non-inhibitor compounds based in 50% inhibition 

4111 for each OATP sub-family members 


Transporter 

Inhibitor 

Non-inhibitor 

Total 

OATP IB 1 

78 

147 

225 

OATP1B3 

46 

179 

225 

OATP2B1 

45 

180 

225 


4112 

4113 Several QSAR models were developed for each sub-family of OATP transporter 

4114 using the training set compounds. Based on the prediction error for the validation 

4115 sets, two QSAR models were selected for the prediction of binding to each OATP 

4116 for the biliary excretion dataset. Section 6.3.1 gives a brief description of the 

4117 regression based models, while section 6.3.2 gives description of classification 

4118 models for OATP interaction. The results of using the predicted OATP effects as 

4119 the independent variables (descriptors) of biliary excretion models have been 

4120 presented in section 6.3.3. 

4121 

4122 6.3.1. Regression Models for Binding to OATP Transporters 

4123 Percentage inhibition of OATP transport of a probe substrate by compounds were 

4124 analysed in this study to develop QSAR models. Distribution of the inhibition data 

4125 showed nonnal distributions with ‘Skewness’ values of 0.163, 0.328 and -3.03; 

4126 logarithmic transfonnation of this data led to more skewed data distribution. As a 

4127 result, QSAR models were developed with percentage inhibition as the dependents 

4128 variable (non-logarithmic scale). Several QSAR models were developed for each 

4129 sub-family members of OATP including multiple linear regression analysis, C&RT, 

4130 boosted trees, random forest, MARS and support vector machine analysis. Two best 

4131 models for each OATP sub-family based on the lowest error rate in the validation 

4132 set were selected and are presented below. 

4133 

4134 
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4135 6.3.1.1 Selected OATP1B1 Models 

4136 Random Forest 

4137 A random Forest model was the best model for the estimation of OATP1B1 

4138 percentage inhibition values of the external validation set. The selected best RF 

4139 model was achieved using the number of trees set at 100, a subsample proportion of 

4140 0.50, and the random test data proportion of 0.3. Figure 6.1 shows the error 

4141 reducing as the number of trees increases, and reaching a clear plateau by 100 trees. 

4142 Prediction accuracy of this model has been presented in Table 6.2 and 6.3. Mean 

4143 absolute error value for the training and validation sets are ~18 and ~21 

4144 respectively. It must be noted here that the errors correspond to the percentage 

4145 inhibition values in non-logarithmic scale which explains the higher order of the 

4146 observed error. 

4147 The most important descriptor (based on predictor importance in STATISTICA) for 

4148 this model is VAdjMA, which is a bond count descriptor and defines the number of 

4149 heavy-heavy bonds in the molecule. The other molecular descriptors, in the top ten 

4150 important molecular descriptor list, were Chil, the molecular connectivity index, 

4151 b_heavy, number of bonds between heavy atoms, SMRJVSA3, the surface area 

4152 corresponding to atoms with (0.35, 0.39] atomic contribution to molar refractivity, 

4153 VS A, the total van der Waals surface area, Kierl, molecular shape index, logP 

4154 calculated by ACD software, and the maximum positive hydrogen atom-level E- 

4155 state value in a molecule (Hmaxpos). 
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Summary of Random Forest 
Response: OATP1B1 

Number of trees: 100; Maximum tree size: 100 



-Train data 

, „ _ , Number of Trees -Test data 

4156 

4157 Figure 6.1. OATP1B1-RF model. Average squared error of OATP1B1 against the 

4158 number of trees in the random forest model (RF) for the training and internal test 

4159 set 

4160 Table 6.2. Statistical parameters of the selected models for training and internal test 

4161 sets 


OATP 

subfamily 

Model 

Group 

Risk Estimate 

Standard 

Error 

OATP IB 1 

OATP1B1-RF 

Train 

525 

61.1 

Validation 

737 

135 

OATP1B1-RT 

Train 

512 

58.1 

Validation 

690 

141 

OATP1B3 

OATP1B3-BT 

Train 

487 

61.7 

Validation 

775 

212 

OATP1B3-RF 

Train 

473 

104 

Validation 

704 

165 

OATP2B1 

OATP2B1-BT 

Train 

1959 

729 

Validation 

1068 

239 

OATP2B1-RF 

Train 

1693 

698 

Validation 

987 

215 


4162 

4163 

4164 


166 























4165 Table 6.3. Summary of the prediction accuracy of the selected QSAR models for 

4166 the training and external validation sets 


OATP 

subfamily 

Selected Model 

MAE for training 
set 

MAE for validation 

set 

OATP IB 1 

OATP1B1-RF 

17.6 

21.0 

OATP1B1-RT 

20.6 

21.0 

OATP1B3 

OATP1B3-RF 

15.8 

20.1 

OATP1B3-BT 

16.6 

20.3 

OATP2B1 

OATP2B1-RF 

24.3 

24.9 

OATP2B1-BT 

27.3 

25.2 


4167 

4168 Regression Tree (RT) 

4169 The second best QSAR model for OATP1B1 inhibition was a regression tree from 

4170 C&RT analysis. RT was generated using all molecular descriptors while cross- 

4171 validation was applied with default V-value of 10 and using interactive C&RT 

4172 routine STATISTICA. This RT has only one split based on ChilC, the carbon 

4173 valence connectivity index (a topological descriptor). According to this tree, 

4174 compounds with Chil C > 9.698 can bind more strongly to OATP1B1 with an 

4175 average percentage inhibition of ~68% (node 3). This RT has been presented in 

4176 Figure 6.2. Table 6.3 shows that despite the very simple nature of this regression 

4177 tree, the prediction accuracy for the external validation set is similar to the RF 

4178 model explained earlier. 
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Tree graph for OATP1B1 

Num. of non-terminal nodes: 1, Num. of terminal nodes: 2 



Figure 6.2. The selected RT model for OATP1B1 inhibition developed using 
C&RT analysis. 


6.3.1.2 Selected OATP1B3 Models 
Random Forest 

The best model for the prediction of OATP1B3 inhibition for the external 
validation set was achieved using random forest analysis when with a subsample 
proportion of 0.60 was used and the other statistical parameters were set to default 
including random test data proportion of 0.3 and the number of trees of 100 (Figure 
6.3). 

The most important molecular descriptor of the RF model for OATP1B3 is 
VAdjEq, which is a bond count descriptor and defines the number of heavy-heavy 
bonds in the molecule. Other most important descriptors of the model are the 
number of single bonds (b single), volsurf descriptors indicating hydrogen bonding 
donor capacity, molecular wrinkled surface and molecular volume (vsurf_HB6, 
vsurf R and vsurf Y) and molar refractivity (SMR). 
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Summary of Random Forest 
Response: OATP1B3 

Number of trees: 100; Maximum tree size: 100 



-Train data 

Number of Trees - Test data 

Figure 6.3. Average squared error of prediction of OATP1B3 inhibition against the 
number of trees in the selected RF model. 


Boosted Trees 

Boosted trees analysis using various combinations of model parameters resulted in 
the second best model for the prediction of the OATP1B3 percentage inhibition of 
the external validation set. In this BT model, the optimal number of trees was 54, 
with the learning rate of 0.05 and subsample proportions 0.55. Tables 6.2 and 6.3 
give a summary of the statistical parameters for the OATP1B3 models. The graph 
of average squared error against number of trees for training and cross-validated 
test sets has been presented in Figure 6.4. 

The top ranked most significant molecular descriptors of this model in descending 
order of significance are LogD(10), the apparent partition coefficient at pH 10, FiA, 
fraction of compound that is ionised as an acid at pH 7.4, SaaCH, atom-type 
electrotopological index for aromatic CH groups, SaaCH acnt, the number of 
aromatic CH groups, the volsurf descriptors, vsurf_IW4 and vsurf_IW5 (indicating 
hydrophilic integy moments at different levels from -0.2 to 1.6 Kcal/mol), 
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vsurf_W5 (hydrophilic volume) and SHBint4, internal hydrogen bonding index 
separated by four skeletal bonds. 


Summary of Boosted Trees 
Response: OATP1B3 


Optimal number of trees: 54; Maximum tree size: 3 



Number of Trees 


-Train data 

-Test data 

- Optimal number 


Figure 6.4. Average squared error against the number of trees in the selected BT 
model for OATP1B3 inhibition. 


6.3.1.3 Selected OATP2B1 Models 
Random Forest 

A Random forest model was the best model for the prediction of OATP2B1 binding 
of the external validation set compounds. The prediction error for the training and 
internal test sets as a function of the number of trees has been presented in Figure 
6.5. This model was obtained with a subsample proportion of 0.55 and the default 
parameters of the software. Hmaxpos (the maximum positive hydrogen atom-level 
E-state value in a molecule) is the most significant molecular descriptor of this 
selected RF model for OATP2B1. Two BCUT descriptors with atomic 
contributions to molar refractivity (BCUTSMR3) and lipophilicity 
(BCUT SLOGP 3), as well as total polar van der Waals surface area 
(QVSAPOL) and fractional negative van der Waals surface area 
(Q VSA FNEG) were the other most important variables of this model. 
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Summary of Random Forest 
Response: OATP2B1 

Number of trees: 100; Maximum tree size: 100 



-Train data 

-Test data 


Figure 6.5. Average squared error for the training and internal test sets against the 
number of trees in the selected RF model for OATP2B1 inhibition. 


Boosted Trees 

The second best QSAR for the prediction of OATP2B1 binding for the external 
validation set was obtained using BT analysis when the maximum numbers of trees 
was 200, with the learning rate of 0.05 and subsample proportions of 0.45 
respectively. In the selected BT model the optimum number of trees for predicting 
OATP2B1 binding of the internal test set was only two (Figure 6.6). Tables 6.2 and 
6.3 give a summary of the statistical parameters for the OATP2B1 models. 

The most important descriptors using boosted trees analysis were a_ICM, the 
entropy of the element distribution in the molecule, ratio of carbon atoms in the 
molecule (C ratio), atom type electrotopological state indexes for various types of 
carbon atoms (SssssC, SsssCFI and SdssC), and the maximum hydrogen atom-level 
E-state value in a molecule (Hmaxpos and FImax). 
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Summary of Boosted Trees 
Response: OATP2B1 

Optimal number of trees: 2; Maximum tree size: 3 



-Test data 

Number of Trees - Optimal number 

Figure 6.6. Average squared error for the training and internal test sets against the 
number of trees in the selected BT model for OATP2B1 inhibition. 


6.3.2. Classification Models for Binding to OATPs 

Due to lower accuracy of percentage inhibition data in comparison with more ideal 
K; or IC 5 o data, in addition to prediction (regression) type QSAR models, 
classification models were also investigated. Classification using C&RT analysis 
was carried for the dataset of OATP sub-family members. Initially all 387 
molecular descriptors were set as independent variables and inhibitor or non¬ 
inhibitor class (based on a 50% inhibition threshold) was set as dependent 
categorical variable. In this way, the classification tree selects the most significant 
descriptors from the 387 descriptor pool for each split. Figures 6.7, 6.8 and 6.9 
show the classification trees for OATP IB 1, OATP1B3 and OATP2B1, 
respectively. Table 6.4 shows the predictive performance measures of the 
classification trees for OATP models. Sensitivity (SE) shows the percentage of 
inhibitors predicted correctly and specificity (SP) indicates the percentage of non¬ 
inhibitors predicted correctly. Recall that SE, SP and SP x SE should be 
maximized. 
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Figure 6.7 shows the classification tree for OATP1B1 binding (CT (1)). Similar to 
the RT model for OATP1B1 (Figure 6.2), the descriptor chilC is the first split 
variable of CT (1). The cut-off point for the inhibitor class is ChilC > 9.68, which 
is also similar to OATP1B1 RT model. Larger molecules containing many carbon 
atoms are classed as inhibitors with very few exceptions. An example of exceptions 
is the compounds with a very low ratio of hydrophilic to lipophilic regions 
(vsurf HLl < 0.05). Compounds classed as non-inhibitor compounds in node 2 are 
further divided to allow compounds classed as inhibitors if they are very lipophilic 
(LogD(2) > 4.06), or if they contain an acidic group (partially charged hydrogen 
atom) (Hmin > 1.39), or if they have a large total negative van der Waals surface 
area (PEOE_VSA_NEG > 204.43). 
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Tree graph for inhibitor and non-inhibitor class 
Num. of non-terminal nodes: 8, Num. of terminal nodes: 9 
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Figure 6.7. CT (1) graph for the best model selecting all descriptors for OATP1B1 
50% inhibition 


The classification tree for OATP1B3 (CT (2)) is presented in Figure 6.8. The most 
important molecular property for OATP1B3 inhibitors is a high ratio of rotatable 
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(single) bonds to total number of bonds in the molecule (brotR > 0.3). These 
flexible molecules need to have a relatively small fraction of polar (to total) surface 
area to be classed as OATP1B3 inhibitors (Q_VSA_FPOL < 0.36). On the other 
hand, more rigid molecules can be inhibitors if they have a large total negative 
polar surface area (Q VSA PNEG > 175.05) or a large BCUT SMR l or 
otherwise, for compounds with large difference between positively charged and 
negatively charged surface area (DASA), a low BCUT_SMR_2 (< 0.067) as well as 
a low BCUT SLOGP l (< -0.47), whereas for compounds with small difference 
between positively charged and negatively charged surface area, they need a large 
contact distance between the hydrophilic interaction centres of the molecule 
(vsurf_DW13). 
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Figure 6.8. CT (2) graph for the best model selecting all descriptors for OATP1B3 
50% inhibition 


Figure 6.9 shows the classification tree for OATP2B1 (CT (3) model). The first 
split variable here is vsurf_Wl, indicating more hydrophilic drugs (vsurf_Wl > 
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1412.81) to be inhibitors of this transporter especially if they have a low total 
positive partial charge calculated by PEOE method (PEOE PC+ < 4.52), but higher 
than 4.04 total partial positive charge calculated by AMI semiempirical method. 
Less lipophilic compounds will need a GCUT SLOGP O value higher than -0.79 
(node 15) to be classed as OATP2B1 inhibitor. 



Figure 6.9. CT (3) graph for the best model selecting all descriptors for OATP2B1 
50% inhibition 

Table 6.4 shows that sensitivity and specificity values are generally good especially 
for the classification model for OATP1B1 inhibition (CT (1)). All models show 
better statistics for the training set than for the validation set. The specificity of CT 
(2) is particularly low for the external validation set. This means that CT (2) cannot 
classify the non-inhibitors of OATP1B3 accurately, whereas it can predict the 
inhibitors reasonably well. 
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4318 Table 6.4. Results of classification analysis using C&RT routines for OATP1B1, 

4319 OATP1B3 and OATP2B1 


OATP subfamily 

Model 

Set 

SP x SE 

SE 

SP 

OATP IB 1 

CT (1) 

Train 

0.938 

0.989 

0.949 

Validation 

0.593 

0.806 

0.736 

OATP1B3 

CT (2) 

Train 

0.753 

0.942 

0.800 

Validation 

0.300 

0.828 

0.363 

OATP2B1 

CT (3) 

Train 

0.622 

0.882 

0.705 

Validation 

0.447 

0.773 

0.578 


4320 

4321 6.3.3. QSAR Models for Biliary Excretion Using OATP Effects 

4322 The selected regression based models from section 6.3.1 were used for the 

4323 prediction of percentage OATP inhibition by compounds in the biliary excretion 

4324 dataset. The predicted OATP binding parameters included percentage OATP IB 1 

4325 inhibition by RF and RT methods (OATP1B1-RF and OATP1B1-RT), percentage 

4326 OATP1B3 inhibition by RF and BT methods (OATP1B3-RF and OATP1B3-BT) 

4327 and percentage OATP2B1 inhibition by RF and BT methods (OATP2B1-RF and 

4328 OATP2B1-BT). These parameters were used as numerical variables in the QSAR 

4329 model development for biliary excretion of compounds. Moreover, the 

4330 classification trees from section 6.3.2, CT (1) - CT (3), were used for the prediction 

4331 of OATP inhibitor/non-inhibitor classes of the compounds in biliary excretion 

4332 dataset. The predicted classes were used as categorical variable in the QSAR model 

4333 development using biliary excretion dataset. 

4334 

4335 6.3.3.1. Regression Tree Models Using Predicted OATP Effects 

4336 C&RT analysis was used for the development of a regression tree where log BE% 

4337 was the dependent continuous variable and the predicted OATP effects along with 

4338 the molecular descriptors were the independent variables (predictors of the model). 

4339 The resulting RT (3) model for the training set is presented in Figure 6.10. The 

4340 molecular descriptors employed in the trees have been explained in Table 6.5. 
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It can be seen in Figure 6.10 that one of the predicted OATP effects, percentage 
inhibition of OATP1B3 predicted by RF model (OATP1B3-RF), has been selected 
by the tree. According to this model, and in agreement with the QSARs discussed 
earlier (MLR (1), MARS (2) and MARS (3)) for biliary excretion, compounds with 
large H-bond donor capacity (vsurf_HB3) have higher biliary excretion. The biliary 
excretion rises further if compounds have high acid/base dissociation (fU < 0.001) 
as seen with previous models such as RT (1) (Figure 4.3). On the other hand, 
compounds with lower H-bond donor capacity and small negatively charged 
surface area (QVSANEG < 195.42) are mainly non-inhibitors of OATP1B3 (45 
out of 49 compounds node 4) with a low biliary excretion level. Few compounds in 
node 7 which have been predicted by RF method to be OATP1B3 inhibitors have a 
very low log BE% (node 7). It must be noted that this result is contradictory to the 
expectations that compounds with OATP1B3 binding should have more 
predisposition for biliary excretion. Tables 6.6 and 6.7 provide the statistical 
parameters of this regression tree, along with all the other models. 


Tree graph for log BE% 

Num. of non-terminal nodes: 4, Num. of terminal nodes: 5 
Model: C&RT 


Mu=1.043 
Var: 0.578 


ID=2 N=67 


ID=3 N=101 

Mu=0.467 


Mu=1.411 

Var: 0.561 


Var: 0.255 


Q VSA NEG 


Mu=0.249 
Var: 0.367 


OATP1B3-RF 


Mu=1.156 

Var=0.547 


Mu=1.612 

Var=0.152 


Mu=1.028 

Var=0.229 


TD^6 


TD=7 - 6H6 

Mu=0.357 


Mu=-0.512 

Var=0.321 


Var=0.053 


Figure 6.10. RT (3) developed using the training set with the descriptors selected by 
C&RT 
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4360 Table 6.5. A brief description of the most important molecular descriptors selected 

4361 and used by the models. 


Descriptor 

Model 

Description 

a count 

RF (3) 

Number of atom. 

ASA- 

RF (3) 

Water accessible surface area of all atoms with 
negative partial charge (strictly less than 0). 

balabanJ 

CT (2) 

Balaban averaged distance sum connectivity index 
(Balaban, 1982). 

b IrotN 

RF (3) 

Number of rotatable single bonds (Conjugated 
single bonds are not included (e.g. ester and peptide 
bonds)). 

b rotR 

CT (2) 

Fraction of rotatable single bonds (b rotN divided 
by number of bonds between heavy atoms). 

b_single 

RF (3) 

Number of single bonds. 

BCUTPEOE2 

BT (5), 
I-Tree (9) 

The BCUT descriptor (see Table 4.2) using PEOE 
atomic partial charges. 

BCUTSLOGPl 

CT (2) 

The BCUT descriptor using atomic contribution to 
logP instead of partial charge. 

BCUTSMRl 

CT (2) 

This BCUT descriptor using atomic contribution to 
molar refractivity. 

BCUTSMR2 

CT (2) 

This descriptor using atomic contribution to molar 
refractivity. 

chilv 

RF (3) 

Atomic valence connectivity index. 

chil C 

CT (1) 

Carbon connectivity index. 

DASA 

CT (2) 

Absolute value of the difference between ASA+ and 
ASA-. 

dens 

BT (5) 

Mass density: molecular weight divided by van der 
Waals volume as calculated in the vol descriptor. 

density 

I-Tree (4) 

Molecular mass density: Weight divided by 
vdw_vol (amu/A 3 ). 

fiB 

CT (1) 

The fractions of compounds ionised at pH 7.4 as 
base. 


RT (3), 
I-Tree (4), 

Fractions of compounds unionised at pH 7.4. 

fU 

I-Tree (5), 
I-Tree (6), 
I-Tree (9) 


GCUTPEOEO 

I-Tree (4) 

The GCUT descriptors (see Table 4.2) using PEOE 
atomic charge. 

GCUTPEOE2 

BT (5), 
I-Tree (6) 

See GCUT PEOE O 

GCUTSLOGPO 

CT (3) 

The GCUT descriptors using the atomic 
contribution to logP. 

GCUT SLOGP 3 

I-Tree (7) 

See GCUT SLOGP 0 

glob 

I-Tree (9) 

Molecular globularity. Globularity or inverse 
condition number is the smallest eigenvalue divided 
by the largest eigenvalue of the covariance matrix of 
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Descriptor 

Model 

Description 



atomic coordinates. A value of 1 indicates a perfect 
sphere while a value of 0 indicates a two- or one¬ 
dimensional object. 

Hmax 

BT (5) 

Maximum hydrogen E-State atom-level value in a 
molecule. 

Hmaxpos 

BT (5) 

The maximum positive hydrogen atom-level E-state 
value in a molecule. 

Hmin 

I-Tree (9), 
CT (1) 

Minimum hydrogen E-State atom-level value in a 
molecule. 

Kier2 

I-Tree (6) 

Second order kappa shape index: (n-1) 2 / m 2 (Hall el 
al., 2007). 

KierA2 

RF (3) 

Second order alpha modified shape index: s (s-1) 2 / 
m where s = n + a 

KierFlex 

I-Tree (6) 

Kier molecular flexibility index: (KierAl) (KierA2) 
/ n (Hall et al, 2007). 

LogD(lO) 

BT (5) 

Logarithm of distribution coefficient D of a 
compound between octanol and buffer layers at pH 
value 10. 

LogD(5.5) 

BT (5) 

Logarithm of distribution coefficient D of a 
compound between octanol and buffer layers at pH 
value 5.5. 

LogD(6.5) 

BT (5), 
I-Tree (5) 

Logarithm of distribution coefficient D of a 
compound between octanol and buffer layers at pH 
value 6.5. 

LogD(7.4) 

BT (5) 

Logarithm of distribution coefficient D of a 
compound between octanol and buffer layers at pH 
value 7.4. 

LogD(2) 

CT (1) 

Logarithm of distribution coefficient D of a 
compound between octanol and buffer layers at pH 
value 2. 

MW 

I-Tree (7) 

Molecular weight. 

OATP1B1-RF 

I-Tree (4), 
I-Tree (7) 

Percentage inhibition of OATPlBlpredicted by RF 
model with subsample proportion ratio of 0.50 

OATP1B3-RF 

RT (3) 

Percentage inhibition of OATP1B3 predicted by RF 
model with subsample proportion ratio of 0.60 

OATP1B3-BT 

I-Tree (5), 
I-Tree (7) 

Percentage inhibition of OATP1B3 predicted by BT 
model (with subsample proportion ratio of 0.55 and 
learning rate of 0.05). 

OATP2B1-RF 

I-Tree (6) 

Percentage inhibition of OATP2B1 predicted by BT 
model (with subsample proportion ratio of 0.55). 

PC+ 

CT (3) 

Total positive partial charge. 

PEOEPC+ 

RF (3), 

CT (3) 

Total positive partial charge. 

PEOEVSAHYD 

I-Tree (5) 

Total hydrophobic van der Waals surface area. This 
is the sum of the van der Waals surface area such 
that absolute value of atomic charge is less than or 
equal to 0.2. 

PEOE VS A NEG 

CT (1) 

Total negative van der Waals surface area. 
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Descriptor 

Model 

Description 

PEOE_VSA+0 

I-Tree (5) 

Van der Waals surface area of atoms with atomic 
charge in the range [0.00,0.05). 

PEOEVSA-O 

I-Tree (9) 

Van der Waals surface area of atoms with atomic 
charge in the range [-0.05,0.00). 

PEOE_VSA+4 

RF (3), 
I-Tree (10) 

Van der Waals surface area of atoms with atomic 
charge in the range [0.20,0.25). 

Predicted 
OATP1B1 Class 

I-Tree (8) 

This is a categorical descriptor (0 and 1) shows 
percentage inhibition of OATP1B1 predicted by 
C&RT routine model. 

Predicted 
OATP1B3 Class 

I-Tree (9) 

This is a categorical descriptor (0 and 1) shows 
percentage inhibition of OATP1B3 predicted by 
C&RT routine model. 

Predicted 
OATP2B1 Class 

I-Tree (10) 

This is a categorical descriptor (0 and 1) shows 
percentage inhibition of OATP2B1 predicted by 
C&RT routine model. 

QVSAFPOL 

CT (2) 

Fractional polar van der Waals surface area. This is 
the sum of the van der Waals surface area such that 
absolute value of atomic charge is greater than 0.2 
divided by the total surface area. 

QVSAPNEG 

CT (2) 

Total negative polar van der Waals surface area. 
This is the sum of the van der Waals surface area 
such that absolute value of atomic charge is less 
than -0.2. 

QVSANEG 

RT (3), I- 
Tree (5) 

Total polar negative van der Waals surface area. 
This is the sum of the van der Waals surface area 
such that absolute value of atomic charge is greater 
than 0.2. 

SMR_VSA2 

I-Tree (7) 

Sum of approximate accessible van der Waals 
surface area for atoms with atomic contribution to 
molar refractivity in (0.26, 0.35]. 

vdw_area 

I-Tree (4) 

The van der Waals surface area (A 2 ) calculated 
using a connection table approximation. 

vsurf_D7 

CT (1) 

Hydrophobic volume (8 descriptors). 

vsurf_ID8 

CT (1) 

Hydrophobic integy moment (The "integy moment" 
is defined in analogy to the dipole moment and 
describes the distance of the centre of mass to the 
barycenter of hydrophobic regions). Small integy 
moment indicates that the hydrophobic moieties are 
either close to the centre of mass or they balance at 
opposite ends of the molecule, so that their resulting 
barycentre is close to the centre of the molecule. 
VolSurf computes ID at eight different energy levels 
(from -0.2 to 1.6 Kcal/mol). 

vsurfCP 

I-Tree (6), 
I-Tree (9) 

Critical packing parameter. This parameter defines a 
ratio between the lipophilic and hydrophilic part of a 
molecule. It is defined as: volume (lipophilic 
part)/[(surface(hydrophilic part)(length of lipophilic 
part)]. Therefore, critical packing refers to 
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Descriptor 

Model 

Description 



molecular shape as well as lipophilicity/ 
hydrophilicity ratio. 

vsurf_CW2 

BT (5) 

Capacity factor is the ratio of the hydrophilic 
surface over the total molecular surface, calculated 
at eight different energy levels (from -0.2 to -6.0 
kcal/mol). 

vsurf_CW4 

I-Tree (4), 
I-Tree (7), 
I-Tree (6) 

See vsurf_CW2. 

vsurf_DW13 

CT (2) 

Contact distances of the lowest hydrophilic energy 
descriptors (vsurf EWmin) (3 descriptors). 

vsurf EDmin3 

I-Tree (6) 

The lowest hydrophobic energy. 

vsurf HB1 

RF (3) 

H-bond donor capacity at -2.0 Kcal/mol with 
carbonyl oxygen probe (8 descriptors). 

vsurf_HB3 

RT 0), 
I-Tree (5), 
I-Tree (9), 
I-Tree (10) 

H-bond donor capacity at -2.0 Kcal/mol with 
carbonyl oxygen probe (8 descriptors). 

vsurf HB4 

I-Tree (7) 

See vsurf HB3. 

vsurf HL1 

I-Tree (7), 
CT (1) 

Hydrophilic-Lipophilic balance; it is the ratio 
between the hydrophilic regions measured at -3 and 
-4 kcal/mol and the hydrophobic regions measured 
at -0.6 and -0.8 kcal/mol. The balance describes 
which effect dominates in the molecule, or if they 
are roughly equally balanced. 

vsurf_W 1 

CT (3) 

Hydrophilic volume. 

vsurf_W3 

I-Tree (8) 

Hydrophilic volume. 

vsurf_W4 

RF (3) 

Hydrophilic volume. 


4362 

4363 Table 6.6. Statistical parameters of the models for training and test sets 


Model 

Group 

Risk Estimate 

Standard Error 

RT (3) 

Train 

0.107 

0.031 

Validation 

0.583 

0.118 

I-Tree (4) 

Train 

0.211 

0.041 

Validation 

0.242 

0.053 

I-Tree (5) 

Train 

0.201 

0.026 

Validation 

0.341 

0.087 

I-Tree (6) 

Train 

0.177 

0.021 

Validation 

0.365 

0.086 

I-Tree (7) 

Train 

0.213 

0.020 

Validation 

0.268 

0.069 

I-Tree (8) 

Train 

0.210 

0.055 

Validation 

0.380 

0.067 

I-Tree (9) 

Train 

0.188 

0.033 
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Model 

Group 

Risk Estimate 

Standard Error 


Validation 

0.360 

0.096 

I-Tree (10) 

Train 

0.247 

0.039 

Validation 

0.366 

0.088 

BT (5) 

Train 

0.087 

0.008 

Validation 

0.267 

0.085 

RF (3) 

Train 

0.280 

0.043 

Validation 

0.267 

0.066 


4364 

4365 Table 6.7. Summary of the prediction accuracy of the RT models 


Model 

MAE for training set 

MAE for validatioh!366 

RT (3) 

0.236 

0.420 

I-Tree (4) 

0.343 

0.379 4367 

I-Tree (5) 

0.335 

0.409 

I-Tree (6) 

0.332 

0.443 4368 

I-Tree (7) 

0.362 

0.392 

I-Tree (8) 

0.454 

0.455 4369 

I-Tree (9) 

0.334 

0.446 

I-Tree (10) 

0.448 

0.474 4370 

BT (5) 

0.242 

0.362 

RF (3) 

0.387 

0.411 4371 


4372 

4373 6.3.3.2. Interactive Tree Models Using Predicted OATP Effects 

4374 Interactive C&RT analysis was used here to inspect the effect of OATPs more 

4375 closely. In these analyses one of the most accurately predicted OATP binding 

4376 (percentage inhibition) or the predicted OATP class was manually used as the first 

4377 variable in the regression trees for the biliary excretion, and then the tree was 

4378 allowed to grow automatically using the features selected by the analysis. Hence, 

4379 we examine the significance of OATPs, namely OATP IB 1, OATP1B3 and 

4380 OATP2B1 in biliary excretion using I-tree analysis. Table 6.8 describes summary 

4381 of I-tree models in terms of the type of the predicted OATP effect in the model. 

4382 
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4383 Table 6.8. Brief description of the interactive C&RT models 


Model no 

Manually incorporated variables 

I-Tree (4) 

Predicted percentage OATP1B1 inhibition using OATP1B1-RF 
model 

I-Tree (5) 

Predicted percentage OATP1B3 inhibition using OATP1B3-BT 
model 

I-Tree (6) 

Predicted percentage OATP2B1 inhibition using OATP2B1-RF 
model 

I-Tree (7) 

Predicted percentage OATP1B1 and OATP1B3 inhibitions using 
OATP1B1-RF and OATP1B3-BT models 

I-Tree (8) 

Predicted OATP1B1 inhibitor/non-inhibitor class using CT (1) 

I-Tree (9) 

Predicted OATP1B3 inhibitor/non-inhibitor class using CT (2) 

I-Tree (10) 

Predicted OATP2B1 inhibitor/non-inhibitor class using CT (3) 


4384 

4385 I-Tree (4) (Figure 6.11) shows that compounds with high OATP1B1 binding, as 

4386 predicted by OATP1B1-RF, have higher biliary excretion. The statistically selected 

4387 OATP1B1-RF threshold is 37.12. Literally, compounds in biliary excretion dataset 

4388 that have been predicted to inhibit OATP1B1 by > 37.12% (representing stronger 

4389 binding to the transporter), are predicted by this model to have higher biliary 

4390 excretion. Exceptions to this are compounds in node 13, with low hydrophilic 

4391 surface ratio and high GCUTPEOEO. According to this tree, log BE% is low for 

4392 the non-inhibitors of OATP1B1 with a small van der Waals surface area (vdw area 

4393 < 297.08) and especially if they have GCUT_PEOE_0 values below -0.85 (node 9). 

4394 Tables 6.6 and 6.7 provide the statistical parameters of the interactive regression 

4395 trees. 

4396 
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Tree graph for log BE% 

Num. of non-terminal nodes: 7, Num. of terminal nodes: 8 
Model: C&RT 
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Mu=1.043 
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0 
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ID=12 N=14 


ID=13 N=17 


ID=14 


ID=15 N=22 
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Mu=0.631 
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Var=0.128 
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Var=0.451 
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Figure 6.11. I-Tree (4) developed using interactive C&RT analysis using OATP1B1 
descriptor as the first descriptor 


When predicted OATP1B3 effect (OATP1B3-BT) was used in the analysis, I-Tree 
(5) was obtained which has been presented in Figure 6.12. According to this tree, 
28 OATP1B3 inhibitors (> 52.10% inhibition) have a slightly lower average log 
BE%. This is due to the effect of 8 compounds in this group with low total 
hydrophobic surface area (PEOE_VSA_FIYD < 254.04), which have extremely low 
biliary excretion (node 6). For OATP1B3 non-inhibitor compounds, log BE% is 
moderate to high if they have a high H-bond donor capacity (vsurf_HB3 > 298.22) 
(terminal nodes 10, 14 and 15) or alternatively if they have a large negatively 
charged surface area (Q_VSA_NEG > 200.31). 
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Tree graph for log BE% 

Num. of non-terminal nodes: 7, Num. of terminal nodes: 8 
Model: C&RT 


ID=1 N=168 

Mu=1.043 
Var: 0.578 

Predicted OATP1B3-BT 
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<=216.58 >316.58 
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Figure 6.12.1-Tree (5) developed using interactive C&RT analysis using OATP1B3 
descriptor as the first descriptor. 


Figure 6.13 presents the regression tree using predicted OATP2B1 effect 
(OATP2B1-RF) as the first split variable (I-tree (6)). The predicted percentage of 
OATP2B1 inhibition by RF method for compounds in biliary excretion dataset 
ranged from -1 to 28%. According to this tree, compounds with percentage 
inhibition above 22.05 have generally higher biliary excretion, except when the 
compounds are extremely weak acid or bases (fU > 0.001 at pH 7.4) and in addition 
to their large lipophilic to hydrophilic region ratio (vsurf Cp > 0.13). On the other 
hand, OATP2B1 non-inhibitors are generally less excreted through bile, unless if 
they are large (Kier2 > 8.26) especially if they have GCUTPEOE2 > 0.06 (node 
11). Statistical parameters of the model can be seen in Tables 6.6 and 6.7. 
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Tree graph for log BE% 

Num. of non-terminal nodes: 7, Num. of terminal nodes: 8 
Model: C&RT 
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Figure 6.13.1-Tree (6) developed using interactive C&RT analysis using OATP2B1 
descriptor as the first descriptor 


To examine the impact of different OATP subtypes at one single model, predicted 
OATP1B1 and OATP1B3 effects (OATP1B1-RF and OATP1B3-BT) were 
imposed at the first and the second levels of a regression tree using interactive tree 
analysis module in STATISTICA. The best model (most accurate in the prediction 
of external validation set) from this exercise has been presented in Figure 6.14 (I- 
Tree (7)). According to this model, compounds with inhibitory effects on both 
OATP IB 1 and OATP1B3 (14 compounds in node 7) have slightly higher biliary 
excretion than compounds with inhibitory effect on just OATP IB 1 (compare nodes 
7 and 6). Interestingly, compounds with no binding to either one of the OATPs 
(compounds in node 4), may still be highly excreted through bile if they have a high 
H-bond donor capacity (vsurf_HB4 > 150.18). 
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Figure 6.14. I-Tree (7) using predicted percentage OATP2B1 and OATP1B3 
inhibition as the first and second level parameters 


Interactive Tree Using Predicted Class 

We also employed various OATP “predicted class” in the interactive tree as an 
alternative approach to “predicted percentage OATP inhibition” for the prediction 
of biliary excretion. Prediction of OATP inhibitor/non-inhibitor class for 
compounds in biliary excretion dataset was obtained from CT (l)-CT (3). In this 
way, both training and validation set compounds were predicted as class one or zero 
(one for inhibitor or zero for non-inhibitor). The interactive trees using predicted 
OATP IB 1, OATP1B3 or OATP2B1 class as the first partitioning variable (I-Tree 
(8) - I-Tree (10)) are presented in Figures 6.15-6.17 respectively. The molecular 
descriptors employed in the trees have been explained in Table 6.1. Statistical 
parameters of these tree models can be seen in Tables 6.7 and 6.8. 

I-Tree (8) in Figure 6.15 shows a slightly higher average biliary excretion for non¬ 
inhibitors of OATP IB 1, which is contrary to the expectations and also different 
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from the result seen in I-Tree (4) employing percentage inhibition of OATP1B1 
using RF (Figure 6.11). This may be due to poor prediction accuracy of CT (1) for 
the compounds in the biliary excretion dataset, or due to the threshold of 50% 
inhibition used for the classification of inhibitors/ non-inhibitors. It can be noted in 
I-Tree (4) that a threshold value of 35.80% (rather than 50%) has been selected by 
the analysis to split the compounds. Figure 6.15 also shows that both classes 
(inhibitors and non-inhibitors) may be divided into compounds with similarly high 
(nodes 5 and 7) and similarly low (nodes 4 and 6) biliary excretion using specific 
molecular descriptors. According to this model, in agreement to the results seen in 
Chapters 4 and 5 (e.g. RT (1)), compounds with large hydrophilic volume 
(vsurf_W3 >418) and large hydrophilic surface ratio (vsurf_CW4 > 0.69) are 
excreted more in the bile. 


Tree graph for log BE% 

Num. of non-terminal nodes: 3, Num. of terminal nodes: 4 
Model: C&RT 


Mu=1.024 
Var: 0.587 


Predicted OATP1B1 Class 



= 

nhibitor 



= non-inhibitor 



ID=2 


N=48 



ID=3 


N=119 



Mu=0.997 

Var: 0.618 



Mu=1.029 

Var: 0.576 



vsurf_CW4 



vsurf_W3 


<= 0.69 


>0.69 

<=417.56 


> 417^56 

ID=4 

N=25 


ID=5 

N = 23 


ID=6 

N=48 


ID=7 

N=71 

Mu=0.455 

Var=0.464 


Mu=1.586 

Var=0.119 


Mu=0.462 

Var=0.547 


Mu=1.413 

Var=0.231 


Figure 6.15. I-Tree (8) using predicted OATP1B1 inhibition class as the first 
parameter 


Figure 6.16 (I-Tree (9)) shows that the predicted OATP1B3 inhibitor class (node 3) 
has higher biliary excretion, except for the compounds with extremely weak acid or 
base dissociations which are also composed of mainly lipophilic parts (vsurf CP > 
0.10). It can be seen in Figure 6.16, that the numbers of non-inhibitor compounds is 
more than inhibitors (as predicted by CT (2)) (102 vs 65). Non-inhibitors of 
OATP1B3 have considerable biliary excretion (terminal nodes 15, 16 and 17), 
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4476 when they have high H-bond donor capacity (vsurf_HB3 > 295) and more spherical 

4477 shape (glob > 0.10), or if they are not spherical, they have a strongly acidic group 

4478 (Hmin > 0.64). 

4479 


4480 

4481 

4482 


Tree graph for log BE% 

Num. of non-terminal nodes: 8, Num. of terminal nodes: 9 
Model: C&RT 



Figure 6.16. I-Tree (9) using predicted OATP1B3 inhibition class as the first 
parameter 


4483 


4484 I-Tree (10) in Figure 6.17 shows the effect of using predicted OATP2B1 inhibition 

4485 class (by CT (3)) as the first parameter of the regression tree. According to I-Tree 

4486 (10), OATP2B1 inhibitors have higher biliary excretion especially if they have a 

4487 high polar surface area (PEOEVSA+4 > 19.7). On the other hand, the 55 non- 

4488 inhibitor compounds in node 4 with a low H-bond donor capacity have low biliary 

4489 excretion. 


189 















































































4490 

4491 

4492 

4493 

4494 

4495 

4496 

4497 

4498 

4499 

4500 


Tree graph for log BE% 

Num. of non-terminal nodes: 3, Num. of terminal nodes: 4 
Model: C&RT 
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Figure 6.17. I-Tree (10) using predicted OATP2B1 inhibition class as the first 
parameter 


6.3.3.3. Boosted Trees Model Using Predicted OATP Effects 

BT analysis with various parameters as explained in Chapter 4, including various 
learning rates and subsample proportions were examined and the best model was 
selected based on the internal validation set error. The selected model (BT (5)) was 
obtained with the optimal number of trees of 141, learning rate of 0.1 and 
subsample proportion of 0.50 (see Figure 6.18). 
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Summary of Boosted Trees 
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Optimal number of trees: 141; Maximum tree size: 3 



- Test data 

Number of Trees - Optimal number 

Figure 6.18. Average squared error of log BE% against the number of trees in the 
boosted trees model BT (5) for the training and internal test set 

Variable importance was calculated for the BT model using STATISTIC A 
software. Included in Table 6.5 are the top 10 most important molecular descriptors 
of BT (5) model. Lipophilicity descriptors (LogD(5.5), LogD(6.5), LogD(7.4) and 
LogD(lO)), hydrogen atom level E-state descriptors (Hmax and Hmaxpos) and 
vsurf and density descriptors (vsurf_CW2 and dens) are among the top important 
BT (5) descriptors. Although the predicted OATP binding parameters are not 
amongst the top 10 descriptors of the model, they appear to be very important in 
this model in terms of improving the prediction accuracy for the external validation 
set (Tables 6.3). The previous BT models obtained from molecular descriptors (BT 
(1) and BT (2) in Chapter 4), and the BT model using predicted P-gp binding in 
addition to molecular descriptors (BT (4) in Chapter 5) have similar MAE values of 
0.412, 0.417 and 0.416, respectively. BT (5) appears to be considerably more 
accurate with MAE of 0.362. 
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4518 6.3.3.4. Random Forest Model Using Predicted OATP Effects 

4519 The method for the development of a random forest (RF) model has been explained 

4520 in Chapter 4. Based on the accuracy for the internal test set, the selected RF model 

4521 (RF(3)) was obtained using a subsample proportion of 0.50, numbers of trees of 

4522 100, random test data proportion of 0.2 the software’s default settings for stopping 

4523 conditions including minimum number of cases, maximum number of levels, 

4524 minimum number in child node and the maximum number of nodes of 6, 10, 5 and 

4525 100, respectively. Figure 6.19 shows the plot of prediction error against the number 

4526 of trees. Tables 6.2 and 6.3 show the statistical significance of this model. 

4527 Similar to BT model, the variables importance was calculated for RF (3). Included 

4528 in Table 6.1 are the top 10 most important molecular descriptors of model. These 

4529 are vsurf descriptors (vsurf_W4 and vsurf_HBl), number of single bonds (b_single 

4530 and b_lrotN), kappa shape indexes (KierA2 and chilv), number of atoms (a_count) 

4531 and water accessible surface area of atoms with a negative partial charge (ASA-). 

4532 Despite the absence of predicted OATP binding parameters in the top ten important 

4533 parameters list, the use of these parameters in model development has resulted in a 

4534 reduction in external validation set error when comparing RF (1) with MAE of 

4535 0.496 with RF (3) with MAE of 0.411. 

4536 
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Figure 6.19. Average squared error of log BE% against the number of trees in RF 
(3) for the training and internal test set 


6.4. Discussion 

Although in the past decade the knowledge of OATP transporters had an enonnous 
increase in the literature, most of OATP sub-family are still anonymous (Giacomini 
et al., 2010). Various member of OATP transporter family contribute to drug 
disposition and, as a result, are involved in drug-drug interactions. A major 
contribution of OATP transporters to drug disposition is through their function in 
hepatocytes for the uptake of substrate compounds from the blood (Fenner et al, 
2012). Recently, OATP IB 1 inhibition measures have been suggested as a suitable 
surrogate for the more complicated human hepatic uptake assays (Soars et al, 
2012). This was based on a comparison between uptake measures in human 
hepatocytes (in vitro intrinsic clearance) and IC50 values for the inhibition of 
OATP IB 1-mediated uptake of a model substrate for 42 compounds from several 
chemically distinct series. In this investigation the aim was to use the OATP 
inhibition measured in vitro for the prediction of biliary excretion in rats. 
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4556 6.4.1. QSAR Models for the Prediction of OATP Inhibition 

4557 Despite the wide distribution and important implications of OATP transporter 

4558 family, unfortunately, there are several limitations in the study of OATP transporter 

4559 ligands (Karlgren et al., 2012a). This has resulted in a limitation in the availability 

4560 of high quality data for QSAR studies. In this investigation, the inhibition of OATP 

4561 uptake of a substrate by 225 compounds measured as percentage inhibition by a 

4562 single concentration of the compound (Karlgren et al., 2012a) was used as the 

4563 inhibition measure. Data was available for three major OATP subfamilies, 

4564 OATP IB 1, OATP1B3 and OATP2B1. OATP1B1 and OATP1B3 are liver-specific 

4565 transporters, mainly expressed on the basolateral membrane of human hepatocytes 

4566 (Kalliokoski and Niemi, 2009; Giacomini et al., 2010), whereas, OATP2B1 is 

4567 relatively ubiquitous with its localization in several tissues in addition to the liver 

4568 (Kobayashi et al., 2014; Varma et al., 2011). 

4569 After examining several prediction (regression based) statistical techniques 

4570 (stepwise regression analysis, C&RT, BT, RF and MARS), the two best models 

4571 were selected for each OATP subfamily. In addition a classification tree was 

4572 developed for each subfamily, using 50% inhibition as the threshold value for 

4573 inhibitors/ non-inhibitors. 

4574 

4575 OATP1B1 Inhibitors 

4576 For OATP IB 1, RF and C&RT analysis resulted in the best prediction models 

4577 (OATP1B1-RF and OATP1B1-RT). There is only one molecular descriptor used in 

4578 OATP1B1-RT model, ChilC, which is mainly an indicator of molecular size. 

4579 Despite previous investigations suggesting that ligands of this transporter are 

4580 mainly acidic (Hsiang et al., 1999) this has not been indicated in this model. In 

4581 comparison with the regression tree, the classification model for OATP IB 1 (CT 

4582 (1)) has more branches and nine tenninal nodes. The importance of acidic nature of 

4583 OATP IB 1 ligands has been indicated in CT (1). In CT (1), in order to be classed as 

4584 inhibitors, compounds of smaller size (defined by Chil C < 9.68) need to have 

4585 acidic group shown by partially positively charged hydrogen, as in -COOH group 

4586 (Hmin), or high apparent partition coefficient in acidic pH (logD(2)). The crucial 
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4587 impact of large molecular size for OATP ligands is very well established from 

4588 previous studies. Whereas OATs transport low MW compounds, OATPs mediate 

4589 the uptake of larger substrates such as digoxin (Shitara et al., 2002; Hagenbuch and 

4590 Meier, 2003), erythromycin (Sun et ah, 2004) and atorvastatin (Lau et al., 2006). 

4591 This is also in line with a study by Hagenbuch and Meier which reports that 

4592 compounds with molecular weight higher than 350 can be OATP IB 1 substrates 

4593 (Hagenbuch and Meier, 2004). 

4594 A recent QSAR model by Soars and colleagues using IC 5 o values for 262 

4595 proprietary compounds found that maximal hydrogen bonding strength and 

4596 lipophilicity (cLogP) were the most important molecular descriptors of their 

4597 random forest model for predicting OATP1B1 inhibitors (Soars et al., 2014). Our 

4598 random forest model also supports this finding as lipophilicity (LogP) and 

4599 maximum positive hydrogen atom-level E-state value in a molecule (Hmaxpos) 

4600 were dominant molecular features in OATP1B1-RF model. In addition, CT (1) also 

4601 suggests the importance of lipophilicity (LogD(2)) for inhibitors of OATP IB 1. De 

4602 Bruyn and co-workers in a recent study, noted the polar surface area as the key 

4603 molecular feature for an increase in OATP IB 1 inhibition (De Bruyn et al., 2013), 

4604 which is in agreement with CT (1) indicating the positive impact of a high 

4605 hydrophilic/lipophilic balance of the molecular surface (vsurf HLl) and a large 

4606 negative polar surface area (PEOE VSA NEG) for the compounds to be classed as 

4607 inhibitors of OATP 1B1. 

4608 The accuracy of the regression based models for the external validation set is 

4609 similar to the training set (MAE for the percentage inhibition is ~21%). This 

4610 percentage error must be viewed considering the innate error levels associated with 

4611 the single point measurements. Karlgren et al. (2012) have developed classification, 

4612 rather than regression based, QSAR models using this dataset. Their classification 

4613 accuracy for the training and validation sets was 73% and 79% respectively, which 

4614 is similar to CT (1) model (accuracy of 81% for inhibitors and 74% for non- 

4615 inhibitors in the external validation set). 

4616 

4617 
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4618 OA TP1B3 Inhibitors 

4619 The selected regression based models for OATP1B3 inhibition were a RF and a BT 

4620 model (OATP1B3-RF and OATP1B3-BT). Despite allowing for identification of 

4621 the most important features, these two methods cannot be interpreted as directly as 

4622 single classification or regression trees. CT (2) has a very low classification 

4623 accuracy for the non-inhibitors in the external validation set (36%), despite 

4624 performing well for the classification of inhibitors in the same set (83%). Therefore, 

4625 consideration must be given to the accuracy levels when interpreting the molecular 

4626 properties of inhibitors and non-inhibitors. An inspection of CT (2) provides 

4627 required features for inhibitors as explained in Section 6.3.2. Mainly, the inhibitors 

4628 are either flexible with a relatively small fraction of polar surface area, or they are 

4629 more rigid with large negative polar surface area or with a specific molecular 

4630 topology with various BCUT descriptors. The BCUT descriptors have been 

4631 reported to be very useful in terms of capturing sufficient structural detail in 

4632 molecular diversity-related tasks (Stanton, 1999; Pearlman and Smith, 1997). 

4633 Despite this, the incorporation of this parameter to explain variations in the 

4634 biological properties is not successful in this model. 

4635 As explained in the resuts section, the most important molecular descriptors of 

4636 OATP1B3-BT are LogD at pH 10, acidity, aromatic rings, and hydrophilicity or 

4637 hydrogen bonding descriptors. This is in agreement with the findings of De Bruyn 

4638 and co-workers that indicate a LogD value between 3.4 and 7.5 and a medium/ low 

4639 number of hydrogen bond donors are positively correlated with OATP1B3 activity 

4640 (De Bruyn et al., 2013). The most important molecular descriptors of OATP1B3- 

4641 RF are similar to CT (2) model and indicate the importance of the bond count and 

4642 the number of single bonds. In addition, this model also indicates the importance of 

4643 hydrogen bonding donor capacity, molecular shape, and volume. The prediction 

4644 accuracy of the regression based OATP1B3 models is similar to the models for 

4645 OATP IB 1 at ~20% for the external validation set. 

4646 

4647 

4648 
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4649 0ATP2B1 Inhibitors 

4650 A recent study by Shirasaka and colleges (Shirasaka et al., 2014) on OATP2B1- 

4651 mediated uptake of pravastatin and fexofenadine showed the presence of multiple 

4652 binding sites on OATP2B1. The structure of OATP2B1 has been shown to be very 

4653 similar to OATP1B3 using in silico homology modeling studies (Meier-abt et al, 

4654 2005), which suggest that most OATPs share similar features. Very few literature 

4655 data are available for OATP2B1 ligands. For instance, out of 45 OATP2B1 

4656 inhibitors identified in Karlgren’s investigation, 29 compounds were were believed 

4657 to be novel inhibitors not studied before (Karlgren et al., 2012a). As a result, 

4658 despite a few QSAR/ pharmacophore models published for OATP1B1 (Chang et 

4659 al., 2005; De Bruyn et al., 2013; Soars et al., 2012; Karlgren et al., 2012b), there is 

4660 little in silico results available for OATP2B1 (Karlgren et al., 2012a). Based on the 

4661 similarities with other OATP transporters, it may be speculated that OATP2B1 

4662 pharmacophores may share the similar molecular features for the consideration of 

4663 the substrate binding at the positively-charged region (El-Kattan and Varma, 2012). 

4664 Its substrates may have features such as a hydrophobic core to fonn the 7i-stacking 

4665 interaction with the imidazole ring of amino acid H579, or a hydrogen bond donor 

4666 group to directly interact with the nitrogen atom of the imidazole ring (El-Kattan 

4667 and Varma, 2012). 

4668 The selected regression based models for OATP2B1 ligands are RF and BT models 

4669 (OATP2B1-RF and OATP2B1-BT) and CT (3) is the classification model. CT (3) 

4670 model has correctly classified 77% and 58% of the inhibitors and non-inhibitors in 

4671 the external validation set, respectively. The accuracy of the PLS-based 

4672 classification model suggested by Karlgren et al. (2012a) for this transporter was 

4673 75%, but they had used a different classification cut-off point of 32%. CT (2) model 

4674 indicates that inhibitors of OATP2B1 are generally large hydrophilic molecules or 

4675 otherwise they have a specific topological property defined by a GCUT molecular 

4676 descriptor. 

4677 Both regression based models for OATP2B1 had a prediction error of ~25% (MAE 

4678 = 25 for percentage inhibition data) for the external validation set (see Table 6.3). It 

4679 can be seen in the results section that both these models show the importance of 

4680 hydrogen bond donor ability with the molecular descriptors Hmaxpos and Hmax. 
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4681 Moreover the importance of polarity is shown with polar surface area and negative 

4682 polar surface area, and ratio of carbon atoms. 

4683 In brief, physicochemical variables detected as important for inhibition of each 

4684 OATP sub-family, show similarities but there are also some differences observed. 

4685 

4686 6.4.2. Effect of OATP Binding on Biliary Excretion Models 

4687 For hepatobiliary elimination of compounds, it has now become progressively clear 

4688 that the movement of solutes and compounds into and out of cells is often 

4689 dependent on transporter proteins. After compounds enter the hepatocytes, they 

4690 either undergo the metabolism process, or, the intact compounds or their metabolite 

4691 molecules excrete into the bile canaliculus. The uptake transporters enhance biliary 

4692 excretion by importing more compounds into hepatocytes. Among the various 

4693 uptake transporters, OATP family members appear to have remarkably broad 

4694 substrate specifications (Kim, 2003). In human and rat hepatocyte, the hepatic 

4695 uptake of many compounds is mediated by OATP family. Nevertheless, the 

4696 physiological role of the OATP family is still not fully understood (Mikkaichi et 

4697 al., 2004). Varma et al (2012) in their research paper comparing biliary excretion of 

4698 compounds and the chemical space of substrates of human OATPs and rat oatplb2 

4699 observed that there is a significant overlap between these substrates and compounds 

4700 with a rat biliary excretion higher than 10%. 

4701 In this investigation, the predicted OATP inhibition values were used as parameters 

4702 (predictors) for the development of QSAR models for the biliary excretion of 

4703 compounds. In assessing the effect of predicted OATP binding on the QSAR 

4704 models for biliary excretion, it must be noted that QSAR has been used for the 

4705 prediction of OATP effect and that these original OATP QSARs are based on 

4706 percentage inhibition data which is a fast measure of inhibition activity but is less 

4707 reliable than IC 50 values. 

4708 Using C&RT embedded feature selection, only OATP1B3 inhibition is selected in 

4709 the tree structure, and even this is at lower branches of the tree indicating less 

4710 significance of the parameter (RT (3) in Figure 6.10). Moreover, the effect seen by 
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4711 this parameter is in contrast to the expectations that a higher OATP binding should 

4712 result in higher biliary excretion. It must be noted here that the number of OATP 

4713 binding parameters (two numerical predicted percentage inhibition and one 

4714 categorical inhibition class for each subfamily of OATP, making nine in total) is 

4715 much lower than the number of molecular descriptors used (more than 300 in total). 

4716 This gives a higher statistical probability to the molecular descriptors to be selected 

4717 by any statistical feature selection. The OATP descriptors were therefore 

4718 incorporated in the tree structure manually using Interactive Tree analysis in 

4719 STATISTICA. Table 6.8 gives the details of I-Tree (4) - (10) models, and Table 

4720 6.7 gives the prediction accuracy for the training and external validation sets. Table 

4721 6.7 shows that I-Tree models (8) - (10), using the categorical predicted class 

4722 variables are less accurate than the corresponding I-Tree (4) - (7) using the 

4723 numerical predicted percentage OATP inhibition. This may indicate a higher 

4724 prediction accuracy for the regression based models for the prediction of OATP 

4725 effect of compounds in the biliary excretion dataset. 

4726 Among the OATP member family, the role of OATP IB 1 in elimination of 

4727 compounds has become clear over the last decade (Soars et a/., 2012). Accordingly, 

4728 comparing accuracy of I-Tree (4) - (7), it is clear that, out of different OATP 

4729 subfamilies, incorporation of OATP IB 1 inhibition results in the most successful 

4730 model (I-Tree (4) followed by I-Tree (7)). Moreover, incorporation of predicted 

4731 OATP2B1 subfamily results in the least accurate model (I-Tree (6)). This may be 

4732 due to a lower prediction accuracy of the original OATP2B1 model (OATP2B1-RF 

4733 in Table 6.3 with MAE of 25%) rather than a lower significance of OATP2B1 

4734 binding in hepatic uptake and biliary excretion. 

4735 It can be seen that the prediction accuracy of I-Tree (4) is better than RT (3) with 

4736 statistically selected variables. I-Tree (4), indicates that, in general, OATP IB 1 

4737 ligands have higher biliary excretion and, in addition to this, eight different levels 

4738 of log BE% values may be identified by this tree based on several molecular 

4739 properties. The molecular properties have been explained in the results (section 

4740 6.3.3.2) and are similar to the observations from Chapter 4. 

4741 The best QSAR model for the estimation of biliary excretion, using the predicted 

4742 OATP binding in addition to the molecular descriptors as the predictors, was 
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achieved by the boosted trees model, BT (5). BT (5), with incorporation of 
predicted OATP binding effects along with molecular descriptors, is much more 
accurate than the corresponding BT (1) and BT (2), with only molecular 
descriptors, and BT (4), with incorporation of P-gp binding and molecular 
descriptors. 


Since the biliary excretion dataset is completely external and there is no OATP data 
for these compounds, it is difficult to comment on the prediction accuracy of OATP 
inhibition for this dataset using the QSAR models other than the error indication 
given by the external validation set (MAEs reported in Table 6.3 and SP and SE 
values in Table 6.4). In terms of the chemical space, there seems to be a good 
overlap between the molecular properties of the two training sets, as indicated by a 
visual inspection of the scores plot from principle component analysis (PC 1 vs PC2 
plot in Figure 6.20). 



Figure 6.20. The plot between the first and the second principle components of 
PCA using all the molecular descriptors 


In conclusion, incorporation of OATP effects in the prediction of biliary excretion 
resulted in better regression tree models when incorporated manually in interactive 
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4762 trees. Furthermore, a BT model was achieved when OATP effects were used in 

4763 addition to molecular descriptors as predictors of biliary excretion. 

4764 
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4765 7. General Conclusion 

4766 

4767 Biliary excretion is one of the major elimination routes for drugs and as a result, it 

4768 has a major impact on phannacokinetics including drug half-life and dosing 

4769 regimen. Moreover, biliary excretion has implications in drug-drug and food-drug 

4770 interactions through the possible involvement of same transporter proteins. As a 

4771 result, early estimation of biliary excretion may be useful for modification of drug 

4772 structure in drug design to have an ideal drug and can be used as a surrogate for 

4773 more time-consuming and expensive in vivo and in vitro studies. In this project, we 

4774 were able to estimate rat biliary excretion based on physicochemical properties 

4775 using various computational modelling techniques. In addition, the roles of P-gp 

4776 and OATPs, as two important hepatobiliary influx and efflux transporters were 

4777 investigated using QSAR. 

4778 The statistical techniques used for the QSAR development included a range of 

4779 linear, non-linear and ensemble methods to allow the best possible prediction 

4780 accuracy. The methods were multiple linear regression analysis, decision trees 

4781 developed by C&RT and CHAID, MARS, and ensemble decision trees developed 

4782 by random forest and boosted trees methods. Simple models such as classification 

4783 or regression trees, multiple regression analysis and MARS, use manageable 

4784 number of features and allow for easy interpretation of the results. In this way, the 

4785 selected molecular descriptors resulted in some insight into major factors that can 

4786 affect biliary elimination of drugs. 

4787 The biliary excretion dataset used in this project consisted of a diverse dataset of 

4788 217 compounds with percentage of dose excreted intact into bile measured in vivo 

4789 in rat. The first aim of the investigation was to develop a predictive QSAR model 

4790 for this dataset. Table 7.1 gives a brief summary of the prediction accuracy of all 

4791 the biliary excretion models described in this thesis. The most accurate models in 

4792 tenns of the prediction accuracy for the external validation set in descending order 

4793 of accuracy are CHAID (2), BT (5), RT (1) and I-Tree (4). This shows that simple 

4794 regression trees such as CHAID (2) and RT (1) are as powerful in the prediction of 
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4795 biliary excretion as the more sophisticated ensemble methods of boosted trees and 

4796 random forest techniques. 

4797 Table 7.1 MAE values of all the biliary excretion models described in the thesis; 

4798 the selected models have been highlighted in bold. 


Model 

Training set Validation set 

BT (1) 

0.229 

0.412 

BT (2) 

0.226 

0.417 

BT (4) 

0.339 

0.416 

BT (5) 

0.242 

0.362 

CHAID (2) 

0.432 

0.359 

I-tree (1) 

0.345 

0.451 

I-Tree (10) 

0.448 

0.474 

I-tree (2) 

0.424 

0.468 

I-Tree (4) 

0.343 

0.379 

I-Tree (5) 

0.335 

0.409 

I-Tree (6) 

0.332 

0.443 

I-Tree (7) 

0.362 

0.392 

I-Tree (8) 

0.454 

0.455 

I-Tree (9) 

0.334 

0.446 

MARS (2) 

0.438 

0.428 

MARS (3) 

0.436 

0.442 

MLR (1) 

0.377 

0.483 

RF(1) 

0.403 

0.496 

RF (3) 

0.387 

0.411 

RT (1) 

0.304 

0.373 

RT (3) 

0.236 

0.420 


4799 

4800 

4801 From these models, we obtained an insight into the structural profile of cholephilic 

4802 compounds through accurate modelling of the biliary excretion. Molecular 

4803 descriptors selected by all these models including the top ten incorporated in 

4804 boosted trees and random forest models indicated a higher biliary excretion for 

4805 relatively hydrophilic compounds especially if they have acid/base dissociation 

4806 (anionic or cationic), and have a large molecular size. 

4807 Interactive regression trees analysis was a very useful tool that helped investigate 

4808 the effects of specific properties. One such property with regards the previous 
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4809 literature was molecular weight. Despite the established role of molecular weight in 

4810 biliary excretion, the molecular weight thresholds in previous literature are 

4811 generally based on qualitative inference from available data, rather than a 

4812 statistically established threshold (Yang et al., 2009). In this project a statistically 

4813 validated molecular weight threshold established for significant biliary excretion at 

4814 MW = 348 Da. 

4815 Analysis of outliers in majority of the models in Chapter 4 showed the models 

4816 perform best when lipophilicity is not too extreme (log P < 5.35) and for 

4817 compounds with molecular weight above 280 Da. It was also observed that 

4818 compounds with low biliary excretion are more likely to show a higher average 

4819 error. This could be attributed at least in part to the method used for calculation of 

4820 error as, for example, despite the prediction of low biliary excretion at 1% for a 

4821 compound, the difference with the observed value of 0.1% leads to a high absolute 

4822 error of 1. Such estimations may still be acceptable as these low biliary excretion 

4823 compounds had been estimated a BE% value < 4%. 

4824 P-gp is a major efflux pump that operates in hepatocytes and aids with excretion of 

4825 its substrates into bile. Based on the hypothesis that the substrates of this transporter 

4826 may have a higher tendency to be excreted through bile, this project looked at the 

4827 structural features of P-gp ligands. A very accurate measure of ligand binding to 

4828 proteins is the inhibition constant (IQ). K, is believed to be a more universal 

4829 parameter allowing easy comparison of data from different substrate conditions. To 

4830 investigate the molecular requirements of P-gp binding and the effect of P-gp 

4831 binding on biliary excretion levels of compounds, a dataset of 219 unique P-gp 

4832 inhibitor/substrate pairs were collated from original literature. QSAR models were 

4833 developed for IQ using P-gp-ligand docking scores as well as the molecular 

4834 descriptors of the inhibitors and the descriptors of probe substrates used for the 

4835 determination of IQ values. The QSARs indicate that the molecular descriptors are 

4836 more significant in the prediction of P-gp binding than the ligand-enzyme docking 

4837 scores. The QSAR models indicate that potent inhibitors of P-gp have higher 

4838 lipophilicity and molecular size than lead-like compounds as defined by Oprea and 

4839 the limiting lipophilicity is log P > 5.3 for this dataset. Classification and regression 
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4840 tree (C&RT) model had the lowest K; prediction error for the external validation set 

4841 with a mean absolute error of 0.543. 

4842 

4843 Although the QSARs established for P-gp had reasonable accuracy for the 

4844 prediction of K; values of the external validation set, these predictions may not be 

4845 as reliable for the external compounds in the biliary excretion dataset. This can 

4846 occur in case the compounds in the biliary excretion dataset are outside the domain 

4847 of applicability of the QSAR models for P-gp binding. A scores plot from PCA 

4848 showed a considerable difference between the chemical spaces of the two datasets. 

4849 Therefore it was not unexpected when the predicted P-gp inhibition constant could 

4850 not significantly improve the prediction accuracy of the QSAR models. 

4851 OATPs are major uptake transporters that mediate the uptake of a wide range of 

4852 compounds from blood into hepatocytes as the first step of hepatobiliary 

4853 elimination process. To study the significance of OATP binding in biliary 

4854 excretion, a recently published dataset consisting of percentage inhibition of three 

4855 OATP subtypes, OATP IB 1, OATP1B3 and OATP2B1 by 225 compounds was 

4856 employed. Despite the lower quality of this binding measure in comparison with 

4857 IC 50 or K;, QSARs of reasonable accuracy (MAE of 20-25%) were established for 

4858 the three OATP subtypes. In addition, a classification method, i.e. classification 

4859 tree, was also used. Both regression type and classification methods were most 

4860 successful for the prediction of OATP IB 1 binding when compared to OATP1B3 

4861 and OATP2B1 binding. This may be attributed to a more balanced inhibitor/non- 

4862 inhibitor ratio in the dataset for this particular OATP. The results showed large 

4863 hydrophilic compounds with hydrogen bonding donor ability (such as carboxylic 

4864 acid groups) are better inhibitors of OATP IB 1 and OATP2B1, while flexibility was 

4865 an additional factor for OATP1B3. 

4866 A comparison of the chemical spaces of compounds in OATP dataset with 

4867 compounds in biliary excretion dataset using PCA indicated a good overlap of 

4868 properties. The OATP models were used for the prediction of OATP binding of the 

4869 compounds in biliary excretion dataset and the predicted values were used as 

4870 additional parameters for the estimation of biliary excretion using QSAR. Although 

4871 majority of these predicted OATP binding parameters were not picked by C&RT 
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4872 algorithm, and they were not ranked within the top ten most important features of 

4873 BT or RF models, they were important in improving the prediction accuracy of BT 

4874 model and the regression trees, when they were incorporated manually using 

4875 interactive trees. In the selected I-Tree model, the predicted OATP1B1 binding was 

4876 the most significant parameter and this constitutes one of the best models over all 

4877 for the prediction of biliary excretion with an absolute error of 0.38 (I-Tree (4), 

4878 Table 7.1). The BT model has a slightly lower prediction error of 0.36 for the 

4879 external validation set. 

4880 

4881 

4882 
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4883 8. Future Work 

4884 

4885 As a result of the research carried out in this PhD project, it can be seen that there is 

4886 a need to further explore the role of individual ABC transporters as the efflux 

4887 pumps. In addition to the role and impact of efflux drug transporters in the 

4888 hepatocyte, further investigation of the impact of both uptake and other efflux 

4889 hepatic transporters in biliary excretion as well as search for new transporter dataset 

4890 for biliary excretion such as Peptl, Pept2, MRP2, MRP3, MRP4, MRP6, MATE1, 

4891 OAT2, OAT7, OCT 1, NTCP, BSEP, PHT1 and PHT2 can elucidate and bring more 

4892 clear aspects of elimination pathways to light. 

4893 In terms of P-glycoprotein there are large datasets of substrate/non-substrate type, 

4894 some of which are proprietary data and some (smaller datasets) are available in the 

4895 literature (Wang et al., 2011; Broccatelli et al, 2011). Although the data is 

4896 categorical which is not ideal, the chemical space of these datasets may be closer to 

4897 the compounds in biliary excretion dataset. In addition to P-gp, two other efflux 

4898 pumps are also very important in biliary excretion. These are MRP2 and BCRP 

4899 which have high localisation in hepatocytes. The work may involve cutting edge 

4900 QSAR models along with classic QSAR model development, as well as drug- 

4901 enzyme docking methods. These transporter enzymes have also been indicated to 

4902 play roles in the anticancer drug resistance and also phannacokinetic processes 

4903 such as intestinal absorption and blood brain barrier transport. Therefore the models 

4904 will be useful from other perspectives as well as biliary excretion. 

4905 The lack of high resolution structures of several important transporters including P- 

4906 glycoprotein and OATPs has severely limited work in this field. For example, if 

4907 higher resolution models of P-glycoprotein were made available, this may improve 

4908 the docking energies and allow us to visualise the interactions between P- 

4909 glycoprotein and compounds. In terms of P-gp docking, in this work the binding 

4910 pocket was defined using the location of a single co-crystallised ligand. P-gp is 

4911 known to have several binding sites and can accommodate more than one ligand at 

4912 a time. A more detailed investigation may look at docking at several binding sites 

4913 and then, from QSAR perspective, the lowest energy binding could be selected for 
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4914 each compound from the various binding sites to be used as a QSAR parameter. 

4915 Besides, building structure-based phannacophore models of P-glycoprotein 

4916 especially with phannacophore features of hydrophobic, aromatic rings, hydrogen 

4917 bond acceptors or donor, cations, and anions can be helpful. 

4918 In order to further confirm the external applicability and predictive ability of the 

4919 models built in this study as good predictors of P-glycoprotein and OATP binding 

4920 and predictors of biliary excretion, new sets of compounds should be used as 

4921 external validation set to test the constructed models. A major practice, which 

4922 should be carried for the models presented in this thesis, is to investigate diversity 

4923 of the compounds in the datasets and to define the applicability domain of the 

4924 models. 

4925 Furthermore, it will be pertinent to ensure that datasets are robust. For example, for 

4926 P-glycoprotein substrates the goodness of the methods used for the measurement of 

4927 activity should be scrutinised, and several sources of data should be compared if 

4928 compounds or dataset to be used for model building have been repeatedly identified 

4929 in several studies as either substrates or non-substrates of P-glycoprotein. 

4930 Apart from key continuous and classification computational methods for estimation 

4931 of biliary excretion used in this study, other statistical techniques can be utilized to 

4932 predict the biliary excretion e.g. neural networks, support vector machines and 

4933 semi-supervised learning. Neural networks and support vector machines can be 

4934 used as a helpful alternative when there are problems of prediction or classification. 

4935 Semi-supervised learning is a class of supervised learning techniques that make use 

4936 of unlabelled data for training and has emerged as an exciting new direction in 

4937 machine learning research. For example, in the biliary excretion dataset, when 

4938 biliary excretion values are converted to log BE%, there are a few missing values 

4939 for a few compounds (nine) with zero biliary excretion. Semi-supervised learning 

4940 methods can improve models generalizability and applicability by predicting the 

4941 values for these compounds. 

4942 In this investigation, we searched for biliary excretion or clearance data for other 

4943 species before analysis of rat biliary excretion database. For human, we could 

4944 collect a biliary excretion data of 68 compounds. There are some biliary excretion 
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4945 data available for dog and rabbit. However, we did not analyse these datasets owing 

4946 to the limited number of compounds in the datasets. As a part of future work of this 

4947 thesis, to cope with the lack of human biliary excretion dataset, we suggest the 

4948 extrapolation to human phannacokinetic parameters mainly from rat data (but also 

4949 from dog, and monkey data). 

4950 It should be noted that the uptake of drugs via the sinusoidal membrane and drug 

4951 efflux by transporters is a complicated process; further studies of transporter- 

4952 mediated drug-drug interaction in hepatocyte, additional investigation on in silico 

4953 and in vitro transporter methods, linking and utilising the pharmacokinetic 

4954 parameters which will affect the net hepatic clearance such as area under the curve 

4955 (AUC), excretion rate and ratio and half-life is necessary and can elucidate the 

4956 overall elimination process in the liver hepatocyte. 

4957 The relationship between biliary excretion and hepatic metabolism is beyond the 

4958 scope of the present study, however, this should be possible with more data on 

4959 metabolism and using statistical techniques such as partial least squared regression 

4960 (PLS) which allows predicting more than one variable at the same time. 

4961 Finally, the biliary excretion, OATPs, IQ, K m and IC50 dataset can be populated 

4962 with more data as they become available in the literature. 

4963 

4964 
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10. Appendix 


6394 

6395 Appendix I. Percentage of compound’s dose excreted intact through the bile in rats 

6396 and the relevant references 


Compounds 

BE% 

Reference 

1,2,3,6- 

Tetrahydrophthalylsulphathiazol 

e 

45.00 

Hirom PC, et.al. Biochem J. 1972 Oct; 129(5): 1071-7. 
Female Wistar albino rats (180-350g body wt.) 

17-AAG(NSC 330507) 

2.00 

Musser SM, et.al. Cancer Chemother Pharmacol. 2003 
Aug;52(2):139-46. Male Fischer 344 rats (7-8 weeks of 
age and weighing 220-234g) 

17-DMAG (NSC 707545) 

2.38 

Egorin MJ, et.al. Cancer Chemother Pharmacol. 2002 
Jan;49(l):7-19. Male Fischer 344 rats (7-8 weeks of 
age). % of Dose (in total): 4.7 ± 1.4. Parent drug 
accounted for 50.7 ± 3.4% of that. 

2-Aminotoluene-5-sulphonic 

acid 

0.27 

McMahon KA, et.al. Food Cosmet Toxicol. 1969 
Sep;7(5):497-500. Rats (250-350 g body weight) 

2-Ethylsulphanilic acid 

0.29 

McMahon KA, et.al. Food Cosmet Toxicol. 1969. 
Sep;7(5):497-500. Rats (250-350 g body weight) 

4-Glucuronosido-4'- 

hydroxybiphenyl 

92.00 

Millburn P., et al, Biochem. J. 1967; 105, 1275 
Female Wistar albino rats (weighing 200 ± lOg.) 

4-Glucuronosidobiphenyl 

59.00 

Millburn P., et al, Biochem. J. 1967; 105, 1275 
Female Wistar albino rats (weighing 200 ± lOg.) 

5-fluorouracil (5-FU) 

0.40 

Young D, et.al. Nuklearmedizin. 1982 Feb;21(l):l-7. 
Male Fischer rats weighing 150 - 200g 

7-Hydroxymethotrexate 

37.00 

Lutz Fahrig, Helmut Brasch, et al. Cancer Chemother 
Pharmacol( 1989)23, 156-160 

9-nitro-20(S)- 

camptothecin(Rubitecan) 

9.10 

Zhong DF, et.al. Acta Pharmacol Sin. 2003 
Mar;24(3):256-62. Wistar rats (250 ± 20g) 

Acetaminophen(paracetamol) 

0.80 

Ghanem Cl, et.al. J Pharmacol Exp Ther. 2005 
Dec;315(3):987-95. Male Wistar rats (250-290 g) 
Savina PM, et.al. Drug Metab Dispos. 1992 Jul- 
Aug;20(4):496-501. Male Sprague-Dawley rats (266- 

282 g). 

Actinomycin D 

31.00 

Wosilait WD, et.al. Life Sci I. 1971 Sep 
15; 10( 18): 1051-5 

Male Sprague-Dawley rats, weighing about 300 g. 

Adipylsidphathiazole 

40.00 

Hirom PC, et.al. Biochem J. 1972 Oct;129(5):1071-7. 
Female Wistar albino rats (180-350g body wt.) 

Aprepitant 

7.00 

Huskey SE, et.al. Drug Metab Dispos. 2004 
Feb;32(2):246-58. Male SD rats ( 230-300 g) 

Azithromycin 

9.60 

Sugie M, et.al. Antimicrob Agents Chemother. 2004 
Mar;48(3):809-14. Male Wistar Rats, 260 - 270g. Male 
Sprague-Dawley rats (normal rats) (260 to 280g) 

Belotecan 

28.29 

Namkoong EM, et.al. Arch Pharm Res. 2007 
Nov;30(ll):1482-8. Male SD rats (260 - 290g) 

Benzoic acid 

0.09 

Abou-el-makarem M.M., Millburn P, et al Biochem. 
J.(1967)105, 1269 

Beta-methyldigoxin 

53.00 

Funakoshi S., Murakami T, et al, J Pharm Sci. 
(2005)94(6), 1196-203 

B ishydroxycoumarin 

1.88 

Buttar HS, et.al. Br J Pharmacol. 1973 Jun;48(2):278- 
87. Male Albino rats (Wistar, 275 - 355g). % of Dose 
(in total): 12.3 ± 2.7, Parent drug accounted for 15.3 
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Compounds 

BE% 

Reference 



(12.5-18.4)% of that. 

BMS-182874 

0.90 

Chong s, Obermeier M et al. 2003. Arch pharm sci 
26:89-94. 

BMS-187345 

4.50 

Chong s, Obermeier M et al. 2003. Arch pharm sci 
26:89-94. 

BMS-387032 

11.00 

Kamath AV chong S et al. 2005. Cancer chemother 
pharmacol 55:110-116. 

BQ-123 

52.82 

Kato Y, et.al. J Pharmacol Exp Ther. 1999 
Feb;288(2):568-74. Male Sprague-Dawley rats 
weighing approximately 250 to 300g . 

Nakamura T, et.al. J Pharmacol Exp Ther. 1996 
Aug;278(2):564-72. Male Sprague-Dawley rats, 7 to 10 
weeks of age. 

Niinuma K, Kato Y, et al. Am J Physiol. 1999 ;276(5 Pt 
1)1153-1164. 

BQ-485 

97.40 

Kato Y, et.al. J Pharmacol Exp Ther. 1999 
Feb;288(2):568-74. Male Sprague-Dawley rats 
weighing approximately 250 to 300g 

BQ-518 

89.70 

Kato Y, et.al. J Pharmacol Exp Ther. 1999 
Feb;288(2):568-74. Male Sprague-Dawley rats 
weighing approximately 250 to 300g 

Bretylium 

16.00 

Kuntzman R, et.al. Clin Pharmacol Ther. 1970 Nov- 
Dec;ll(6):829-37 

Bromochlorophenol blue 

89.00 

Hirom PC, et.al. Biochem J. 1972 Oct;129(5):1071-7. 
Female Wistar albino rats (180-350g body wt.) 

Bromocresol Green 

73.00 

Hirom PC, et.al. Biochem J. 1972 Oct;129(5):1071-7. 
Female Wistar albino rats (180-350g body wt.) 

Bromophenol Blue 

67.25 

Hirom PC, et.al. Biochem J. 1972 Oct;129(5):1071-7. 
Female Wistar albino rats (180-350g body wt.) 
Wills RJ, et.al. J Pharm Sci. 1983 Oct;72(10): 1127-31 
Fasted male Sprague-Dawley rats (260 - 470g) 

Buprenorphine 

1.08 

Brewster D, et.al. Xenobiotica. 1981 Mar;ll(3):189-96. 
Adult SD rats (200-300g). % of Dose (in total): 92.9 + 
8.0. Parent drug accounted for 1.5 ± 0.8% of that 
(Male) 

% of Dose (in total): 94.5 ± 2.8, Parent drug accounted 
for 0.8 ± 0.4% of that, (female). 

Butoprozine 

0.00 

Overzet F, et.al. Xenobiotica. 1985 Jan;15(l):l-10. 
male Wistar rats (body wt. 300g) 

Cadrala 

zine 

3.70 

Eur J Drug Metab Pharmacokinet. 1983;8(l):25-33. 
Male and female Sprague Dawley rats with an average 
body weight of 150 to 180 g. 

Camptothecin (carboxylate 

form) 

36.40 

Scott DO, et.al. Drug Metab Dispos. 1994 May- 
Jun;22(3):438-42. Male Sprague-Dawley rats weighing 
between 250-300g. 

Guarino AM, et.al. Cancer Chemother Rep. 1973 
Apr;57(2): 125-40. Male Sprague-Dawley rats (240 - 
320g) 

Camptothecin (lactone form) 

7.50 

Scott DO, et.al. Drug Metab Dispos. 1994 May- 
Jun;22(3):438-42. Male Sprague-Dawley rats weighing 
between 250-300g. 

Carbovir 

1.30 

Zimmerman CL, et.al. Drug Metab Dispos. 1993 Sep- 
Oct;21(5):902-10. Sprague-Dawley rat 

Cefamandole 

33.00 

Wright WE, et.al. Antimicrob Agents Chemother. 1980 
May;17(5):842-6. Male Wistar rats, weighing 350 to 
500g 

Cefazedone 

37.40 

Sailer H, et.al. Arzneimittelforschung. 

1979;29(2a):404-l 1 
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Male and female Wistar-WU rats (weight range 175- 
320g. 

Cefazolin 

30.00 

Tsuji A, et.al. J Pharm Sci. 1983 Nov;72(ll):1239-52. 
Male Wistar rats (240g) 

Cefbuperazone (T-1982) 

80.00 

Saikawa I, et.al. Jpn J Antibiot. 1982 Sep;35(9):2163- 
73 

Cefixime 

40.80 

Yasui H, et.al. J Pharm Sci. 1994 Jun;83(6):819-23 
Male Witar rats (177 - 230g). 

Cefmenoxime (SCE-1365) 

28.50 

Tanayama S, et.al. Antimicrob Agents Chemother. 
1980 Oct; 18(4):511-8. male or female Sprague-Dawley 
rats weighing 220 to 515g. 

Cefmetazole 

36.25 

Eur J Drug Metab Pharmacinet. 1992 Jul- 
Sep;17(3):167-73. Male Wistar: 232-298g. 

Cefodizime 

28.60 

Matsushita H, et.al. J Pharmacol Exp Ther. 1992 
Feb;260(2):499-504. Male Wistar rats weighing 240 to 

280g. 

Cefoperazone 

85.60 

Saikawa I, et.al. Jpn J Antibiot. 1980 Oct;33(10):1084- 
96 

Cefotetan (YM-09330) 

48.00 

Komiya M, et.al. Antimicrob Agents Chemother. 1981 
Aug;20(2):176-83. SD rats: 200 - 350g. 

Mizojiri K, et.al. Antimicrob Agents Chemother. 1987 
Aug;31(8):1169-76 

Cefpiramide (SM-1652) 

59.80 

Matsui H, et.al. Antimicrob Agents Chemother. 1982 
Aug;22(2):213-7. Male Sprague-Dawley rats (200 to 
250 g). 

Imasaki H, et.al. Antimicrob Agents Chemother. 1983 
Jul;24(l):42-7. Sprague-Dawley male rats weighting 

150 to 300g. 

Muraoka I, et.al. Antimicrob Agents Chemother. 1995 
Jan;39(l):70-4. 20-week-old healthy SDR (weight, 494 
to 540 g) 

ceftriaxone 

61.80 

Matsui H, et.al. Antimicrob Agents Chemother. 1984 
Aug;26(2):204-7. male SD rats (body weight, 200 to 
250 g) 

celiptium (NSC-264137) 

6.10 

Maftouh M, et.al. Xenobiotica. 1983 May;13(5):303-10 
Male SD rats (300 - 350g). 

Cephalexin 

2.50 

Wright W.E., Line V.D. Antimicrobial Agents & 
Chemotherapy(1980)17, 842-846. Male Wistar rats 
[HAP(WI)BR], weighing 350 to 500 

Cephradine 

27.30 

Moriwaki T, Yasui H and Yamamoto A. 2003. J 
Pharmacokinet Phamacodyn 30:119-144. 

Chenodeoxycholate (CDC) 

0.30 

Takikawa H,et.al. Hepatology. 1991. 14(2):352-60. 
Male SDRs weighing about 270g. % of dose (in total): 
~ 3% at steady state. Parent drug accounted for 6 -10% 
of that. 

Ciprofloxacin 

9.92 

Yamaguchi H, et.al. Pharm Res. 2004 Feb;21(2):330-8. 
Male Wistar rats weighing 200-250g 

Colchicine 

25.36 

Hunter AL, et.al. J Pharmacol Exp Ther. 1975 
Mar;192(3):605-17. 

Male Thorp SD rats (350-390g). % of dose (in total): 

52. Parent drug accounted for 53% of that. 

Speeg KV, et.al. Cancer Chemother Pharmacol. 
1994;34(2): 133-6. Male Sprague-Dawley rats weighing 
300-400g . 

Speeg KV, et.al. Hepatology. 1992 May;15(5):899-903. 
Male SD rats weighing 300 to 400g. CLsys: 43.05 ± 
2.68 mFmin/kg. CLbiliary: 11.62 ± 0.84 ml/min/ 
Kitani K, et.al. Tohu J Exp Med. 1981 Apr;133(4):389- 
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97. Male Wistar rats (300g on the average). % of dose 
(in total): 35.19 ± 2.91. Parent drug accounted for 70.82 
±7.79% of that. 

Compound I (Merck) 
Diastereomer 

13.00 

Prueksaritanont T, et.al. Xenobiotica. 2003 

Nov;33(ll):l 125-37. Male Sprague-Dawley (SD) rats 
(200-320g). 

Prueksaritanont T, et.al. Xenobioticaxenobiotica,2002, 
vol. 32, no. 3, 207±220 Male Sprague-Dawley (SD) 
rats (230-320g). 

Dompound II (Merck) 
Diastereomer 

58.00 

Prueksaritanont T, et.al. Xenobiotica. 2003 

Nov;33(ll):l 125-37. Male Sprague-Dawley (SD) rats 
(200-320g). 

Prueksaritanont T, et.al. Xenobioticaxenobiotica,2002, 
vol. 32, no. 3, 207±220 Male Sprague-Dawley (SD) 
rats (230-320g). 

Cosalane 

1.12 

Kuchimanchi KR, et.al. Drug Metab Dispos. 2000 
Apr;28(4):403-8. Male SD rats weighing 200 to 225 g 

CP-671,305 

48.33 

Kalgutkar AS, et.al. Xenobiotica. 2004 Aug;34(8):755- 
70 

Male and female Sprague-Dawley rats (220-250g) 
Kalgutkar AS, et.al. Drug Metab Dispos. 2007 
35(11):2111-8. Male Sprague-Dawley rats (230-250g) 

Cromoglycate 

71.40 

Ashton MJ, et.al. Toxicol Appl Pharmacol. 1973 
Nov;26(3):319-28 

Male Sprague-Dawley rats (200 - 250g) 

DA-5018 (Capsavanil) 

3.06 

Shim HJ, et.al. J Chromatogr B Biomed Sci Appl. 1997 
Feb 21;689(2):422-6. 

Dasatinib 

10.40 

Christopher LJ, et.al. Drug Metab Dispos. 2008 
Jul;36(7): 1341 -56. 

male Sprague-Dawley rats weighing approximately 340 
to 380g. 

G.Luo, S.Johnson, et al. Drug Metab Dispos. J. 
(2010)38,422-430 

Daunorubicin 

11.76 

Yesair DW, et.al. Cancer Res. 1972 Jun;32(6):l 177-83 
Male Sprague-Dawley rats (350 to 500 g). Amount 
excreted into bile: ~ 500pg. Dose: 10 mg/kg. 

Decamethonium bromide 

1.00 

Hughes R.D., Millburn P., et al, Biochem. J. (1973)136, 
979-984 

Diazepam 

0.00 

Inaba T, et.al. Drug Metab Dispos. 1974 Sep- 
Oct;2(5):429-32. Male Wistar rats (280-320 g).% of 
Dose (in total): 77; No intact diazepam could be 
detected in bile. 

Dibenzyldimethylammonium 

iodide 

36.00 

Hughes RD, Millburn P. et al, Biochem. J. (1973)136, 
967-78 

Diclofenac 

2.99 

Peris-Ribera JE, et.al. J Pharmacinet Biopharm. 1991 
Dec;19(6):647-65. Male Wistar rats (320-380 g). % of 
Dose (in total): 27.2; Parent drug accounted for 4.7% of 
that. 

Diethylmethylphenylammonium 

iodide 

7.60 

Hughes RD, Millburn P. et al, Biochem. J. (1973)136, 
967-78 

Digoxin 

84.4 

Song S, et.al. Drug Metab Dispos. 1999 Jun;27(6):689- 
94 

Female Sprague-Dawley (SD) rats weighing 220 to 
270g. S. Funakoshi, T. 

Murakami, et al, J Pharm Sci. (2005)94(6), 1196-203 
H.Fukuda, R.Ohashi, et al,Drug Metab Dispos. 2008 
Jul;36(7):1275-82 

Dimethyltubocurarine iodide 

17.00 

Hughes RD., Millburn P, et al, Biochem. J. (1973)136, 
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979-984. 

DNP-NAC 

42.00 

Hinchman CA., Rebbeor JF et. 1998. Am j physiol 
275(4 pt 1): G612-9. 

DNP-SG(2,4-Dinitrophenyl-S- 

glutathione) 

100 

Niinuma K, Kato Y, et al American journal of 
physiology: Gastrointestinal & liver physiology, 1999 
;276(5 Pt 1)1153-1164. 

Doxorubicin 

18.26 

Vaidyanathan S, et.al. Cancer Chemother Pharmacol. 
2000;46(3): 185-92. Female Sprague-Dawley rats 
weighing 225 to 250g. 

Krishna R, et.al. Clin Cancer Res. 1999 
Oct;5(10):2939-47. Male SD rats, 225-275g. 

Broggini M, et.al. Cancer Treat Rep. 1980. 64(8- 
9):897-904. CD-COBS male rats (body weight, 200 ± 

20 g) 

Israel M, et.al. Cancer Res. 1978 Feb;38(2):365-70. 
Male SD rats weighing 320 to 440 g. % of Dose (in 
total): 20; Parent drug accounted for 80% of that. 

DPDPE 

80.00 

Chen C, et.al. Pharm Res. 1997 Mar;14(3):345-50 
Male Sprague-Dawley rats (250-300g 

Drotaverine 

0.00 

Vargay Z., Simon G., et al. Eur J Drug Metab 
Pharmacokinet. 1980;5(2):69-74 

E3040 glucuronide 

90.00 

Niinuma K., Kato Y, et al American journal of 
physiology: Gastrointestinal & liver physiology, 1999 
;276(5 Pt 1)1153-1164. 

Takenaka O, Horie T, Suzuki H, Sugiyama Y, J 
Pharmacol Exper Ther. 280(2), 948-958. Male SD rats 
(250-330 g) from Japan Laboratory Animals Inc. 
Hirouchi M et al, Drug Metab Disp. 37 (10)2103-2111; 
OCT 2009, Male Mrp3(- /- ) mice and wild-type FVB 
mice (12-18 weeks). 

Edatrexate 

43.35 

Fanucchi MP, et.al. Cancer Res. 1987 May 
l;47(9):2334-9 

Male CD rats. % of Dose (in total): 51 ± 4; Parent drug 
accounted for 85% of that. 

EDDP 

36.00 

Baselt RC, et.al. Biochem Pharmacol. 1973 Dec 
1 ;22(23):3117-20. Sprague-Dawley male rats (200 - 
300 g). 

EMDP 

0.20 

Baselt RC, et.al. Biochem Pharmacol. 1973 Dec 
1 ;22(23):3117-20. Sprague-Dawley male rats (200 - 
300 g). 

Emepronium (EME) 

12.00 

Neef C, et.al. Naunyn Schmiedebergs Arch Pharmacol. 
1984 Dec;328(2): 103-10. Male Wistar rats 

(approximately 300g). % of Dose (in total): 60; Parent 
drug accounted for < 20% of that 

Epirubicin (4'-epiDOX) 

20.00 

Broggini M, et.al. Cancer Treat Rep. 1980 Aug- 
Sep;64(8-9):897-904. CD-COBS male rats (body 
weight, 200 ± 20 g) 

Erythromycin 

32.20 

Akashi M, et.al. Hepatol Res. 2006 Feb 11,193-198 
Male Sprague-Dawley rats weighting approximately 
270g. 

Kageyama M, et.al. Biol Pharm Bull. 2005 
Feb;28(2):316-22. Male Wistar rats (280 to 320g). 
Amount excreted into bile: 200.3 ± 35.6 pg. Dose: 3 
mg/kg. 

Sato A, et.al. Pharmacology. 1999 Nov;59(5):249-56 
Tachizawa H, et.al. J Gastroenterol Hepatol. 2004 
Sep;19(9):1016-22. Male Sprague-Dawley rats 270g. 
Lam JL, et.al. Drug Metab Dispos. 2006 
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Aug;34(8): 1336-44. Male Wistar rats (200 - 350g). 
CLtot: 47.2 ± 12.5 and 42.1 ± 5.7 ml/min/kg. 

CLbiliary: 15.5 ± 2.9 and 11.2 ± 2.0 ml/min/kg. 

Estradiol-1713-gluc uronide 

87.00 

Akashi M, et.al. Hepatol Res. 2006 Feb 11,193-198 

Estrone 3-sulphate 

18.40 

H.Fukuda, R.Ohashi, et al.Drug Metab Dispos. 2008 
Jul;36(7): 1275-82. Male Sprague-Dawley rats (Charles 
River Japan, Yokohama. Japan) weighing 200 to 250 g 

Felodipine 

0.00 

Sutfin TA, et.al. Xenobiotica. 1987 Oct; 17( 10): 1203- 
14. 

Male SD rats (350g). % of Dose (in total): 74; No 
unchanged felodipine was detected in either bile. 

Fexofenadine 

55.05 

Tahara H, et.al. Drug Metab Dispos. 2005 
Jul;33(7):963-8 

SD rats, 300-350g. CLtot: 28.3 ± 2.1 ml/min/kg; 

CLbiliary- 11.4± 1.6 ml/min/kg 

Tian X., Swift B. Drug Metab Dispos. (2008)36(5), 
911-915 

Floctafenin 

8.90 

Pottier J, et.al. Drug Metab Dispos. 1975 May- 
Jun;3(3): 133-47. Wistar of Sprague-Dawley rats (200 
g)- 

Flomoxef 

17.50 

Hishikawa S, et.al. Chronobiol Int. 2003 
May;20(3):463-71. Male Wistar rats weighing 250-300 
g 

Fluvastatin 

19.50 

Lindahl A, et.al. Mol Pharm. 2004 Sep-Oct;l(5):347-56 
Male Sprague-Dawley rats (305 ± 20g 

Fosmidomycin 

0.10 

Murakawa T, et.al. Antimicrob Agents Chemother. 
1982 Feb;21(2):224-30. 

FPL 55712 

50.00 

Mead B, et.al. J Pharm Pharmacol. 1981 
Oct;33(10):682-4 

Male Wistar rats 

Furosemide 

1.17 

Chen C, et.al. Pharm Res. 2003 Jan;20(l):31-7. 
Male Sprague-Dawley rats, 15 weeks of age (385 - 
550g) 

Gemfibrozil 

0.10 

Dix KJ, et.al. Drug Metab Dispos. 1999 Jan;27( 1): 138- 
46 

Female Sprague-Dawley rats (10-12 weeks old). 

Glutary lsulphathiazole 

42.00 

Hirom PC, et.al. Biochem J. 1972 Oct;129(5):1071-7. 
Female Wistar albino rats (180-350g body wt.) 

Grepafloxacin 

5.81 

Sasabe H, et.al. J Pharmacol Exp Ther. 1998 

Mar;284(3): 1033-9. Male Sprague-Dawley rats 
weighing approximately 250 to 300g. 

Sasabe H,et.al. J Pharmacol Exp Ther. 1998 
Feb;284(2):661-8. Male Sprague-Dawley (SD) 
weighing approximately 250 to 300g. 

Yamaguchi H, et. al. J Pharmacol Exp Ther. 2002 
Mar;300(3): 1063-9. Male Wistar rats, 200-240g. 
Yamaguchi H, et.al. Pharm Res. 2004 Feb;21(2):330-8. 
Male Wistar rats weighing 200-250g 

Flexafluorenium 

34.00 

Meijer DK, et.al. Eur J Pharmacol. 1971 
May;14(3):280-5 

Male Wistar rats weighing 200-250g 

Hexahydrophthalylsulfathiazole 

80.00 

Hirom PC, et.al. Biochem J. 1972 Oct; 129(5): 1071-7. 
Female Wistar albino rats (180-350g body wt.) 

Hippuric acid 

0.00 

Abou-el-makarem AA, Millburn P, et al Biochem. 
J.(1967)105, 1269 

ID-6105 

19.76 

Yoo BI, et.al. Biol Pharm Bull. 2005 Apr;28(4):688-93 
Male SD rats (230 - 250 g) 

Yoo BI, et.a. Arch Pharm Res. 2005 Apr;28(4):476-82 
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Male SD rats (230 - 250 g). 

Indocyanine Green 

30.00 

Jansen PL et.al. Am J Physiol. 1993 Sep;265(3 Pt 
l):G445-52. Male Wistar rats, weighing 250-300g 
KurisuH, et.al. Life Sci. 1991 ;49( 14): 1003-11. 
Sprague-Dawley Rat 

Verkade HJ, et.al. Gastroenterology. 1990 

Nov;99(5): 1485-92. Normal Wistar rats weighting 280- 
320 g. 

Sathirakul K, et.al. J Pharmacol Exp Ther. 1993 
Jun;265(3):1301-12. Male SD rats weighing 
approximately 280 g. 

Takikawa H, et.al. J Gastroenterol Hepatol. 1998 
Apr;13(4):427-32. Male Sprague-Dawley rats 
weighting approximately 270g. 

Hirom PC, et.al. Biochem J. 1972 Oct; 129(5): 1071-7. 
Female Wistar albino rats (180-350g body wt.) 
Tachizawa H, et.al. J Gastroenterol Hepatol. 2004 
Sep; 19(9): 1016-22. Male Sprague-Dawley rats 270g. 
Kimura T, et.al. Biol Pharm Bull. 1993 
Nov;16(l 1):1140-5. Male Wistar rats weighing 200- 
300g 

Chan PK, et.al. J Toxicol Environ Health. 1981 
Feb;7(2):169-79. 

Indomethacin 

2.06 

Kouzuki H, et.al. Pharm Res. 2000 Apr;17(4):432-8 
SD rats of 302-368 g body weight. 

lododoxorubicin (IODOX) 

22.00 

Edwards DM, et.al. Drug Metab Dispos. 1991 Sep- 
Oct;19(5):938-45. Male SD rats (mean weight 201 ± 
6g). % of Dose (in total): 34; parent drug accounted for 
< 6% of that. 

Irinotecan (CPT-11) (lactone 
form) 

7.34 

Chu XY, et.al. J Pharmacol Exp Ther. 1997 
Apr;281(l):304-14. Male SD rats weighing 250 to 
300g. 

Arimori K, et.al. Pharm Res. 2003 Jun;20(6):910-7 
Male Wistar rats from 280 to 340g. 
Itoh T, et.al. J Pharm Pharm Sci. 2004 Jan 23;7(1): 13-8. 
Male Wistar rats, aged 6 to 7 weeks (180-230 g 

J-104132 

99.70 

Kobayashi N, et.al. Pharm Res. 2003 Jan;20(l):89-95 
Male SDRs (250-470 g). 

Lamotrigine 

1.40 

Maggs JL, et.al. Chem Res Toxicol. 2000 
Nov;13(ll):1075-81. Male Wistar rats (180-250g) 

Levofloxacin 

9.04 

Yamaguchi H, et. al. J Pharmacol Exp Ther. 2002 
Mar;300(3): 1063-9 

Male Wistar rats, 200-240g. 

Yamaguchi H, et.al. Pharm Res. 2004 Feb;21(2):330-8. 
Male Wistar rats weighing 200-250g 

Lissamine Fast Yellow 

87.50 

Bertagni P, et.al. J Pharm Pharmacol. 1972 
Aug;24(8):620-4. Male and female Wistar albino rats 
(190-350 g). 

Hirom PC, et.al. Biochem J. 1972 Oct;129(5):1071-7. 
Female Wistar albino rats (180-350g body wt). 

Lithocholate (LC) 

0.98 

Takikawa H,et.al. Hepatology. 1991 Aug;14(2):352-60. 
Male SDRs weighing about 270g. % of dose (in total): 
98% ± 1.6%. Parent drug accounted for 1% ± 1% of 
that. 

Lomefloxacin 

4.26 

Sasabe H, et.al. Biopharm Drug Dispos. 1999. 
Apr;20(3):151-8. Male SD rats weighing approximately 
250-300g. 

Lopinavir 

0.40 

Kumar GN, et.al. Pharm Res. 2004 Sep;21(9):1622-30 
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Sprague-Da wley rats 

Loteprednol etabonate 

4.84 

Wu.W, F. Huang; J of pharmacy and pharmacology, 
60(3),2008, 291-297 

LTC4(leukotriene C4) 

23.10 

K.Niinuma,Y.Kato, et al American journal of 
physiology: Gastrointestinal & liver physiology, 1999 
;276(5 Pt 1)1153-1164 

Denzlinger C, Grimberg M, Kapp A, Haberl C, 
WILMANNS W , British journal of pharmacology; 
1991 102 (4),865-870, male Wistar rats(180-220 g) 

LY110264 

34.40 

Wright W.E., Line V.D. Antimicrobial Agents & 
Chemotherapy(1980)17, 842-846. Male Wistar rats 
[HAP(WI)BR], weighing 350 to 500 

LY112384 

84.70 

Wright W.E., Line V.D. Antimicrobial Agents & 
Chemotherapy(1980)17, 842-846. Male Wistar rats 
[HAP(WI)BR], weighing 350 to 500 

LY126351 

11.00 

Wright W.E., Line V.D. Antimicrobial Agents & 
Chemotherapy(1980)17, 842-846. Male Wistar rats 
[HAP(WI)BRj, weighing 350 to 500 

LY78989 

74.20 

Wright W.E., Line V.D. Antimicrobial Agents & 
Chemotherapy(1980)17, 842-846. Male Wistar rats 
[HAP(WI)BR], weighing 350 to 500 

LY85834 

40.30 

Wright W.E., Line V.D. Antimicrobial Agents & 
Chemotherapy(1980)17, 842-846. Male Wistar rats 
[HAP(WI)BR], weighing 350 to 500 

LY87780 

93.80 

Wright W.E., Line V.D. Antimicrobial Agents & 
Chemotherapy(1980)17, 842-846. Male Wistar rats 
[HAP(WI)BR], weighing 350 to 500 

LY88011 

49.60 

Wright W.E., Line V.D. Antimicrobial Agents & 
Chemotherapy(1980)17, 842-846. Male Wistar rats 
[HAP(WI)BR], weighing 350 to 500 

LY89439 

49.60 

Wright W.E., Line V.D. Antimicrobial Agents & 
Chemotherapy(1980)17, 842-846. Male Wistar rats 
[HAP(WI)BRj, weighing 350 to 500 

Merck compound A 

30.00 

Giuliano C, et.al. Xenobiotica. 2005 Oct-Nov;35(10- 
11): 1035-54. Male Sprague-Dawley rats weighing 250- 
300g. 

Meropenem 

80.20 

Yl.chan, MH.Chou, J Chromatogr A. 2002 Jun 
28;961(1):119-24. Male specific pathogen-free 
Sprague-Dawley rats. 

Methadone 

8.80 

Baselt RC, et.al. Biochem Pharmacol. 1973 Dec 
1 ;22(23):3117-20. Sprague-Dawley male rats (200 - 
300 g). 

Methasquin (NSC 122870) 

29.00 

Rader JI, et.al. Cancer Res. 1971 Jul;31(7):964-9 
CD males, 230 to 420 g 

Methotrexate 

72.00 

Masuda M, et.al. Cancer Res. 1997 Aug 
15;57(16):3506-10. Male SDRs (250 - 300g). 

Lutz Fahrig, Helmut Brasch, et al. Cancer Chemother 
Pharmacol( 1989)23, 156-160 

Sasaki M, et.al. Mol Pharmacol. 2004 Sep;66(3):450-9 
Male SD rats, 240-260g. CLtot: 12.7 ±1.9 ml/min/kg; 
CLbiliary: 10.7± 1.7 ml/min/kg 

Chen C, et.al. Pharm Res. 2003 Jan;20(l):31-7. 

Male Sprague-Dawley rats, 15 weeks of age (385 - 
550g) 

Griffin D, et.al. Cancer Chemother Pharmacol. 
1987;19(1):40-1 

Ueda K, et.al. J Pharmacol Exp Ther. 2001 
Jun;297(3):1036-43. Male Sprague-Dawley rats 
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weighing 250 to 300 g. 

Bremnes RM, et.al. Cancer Res. 1989 May 

l;49(9):2460-4 

Male Wistar rats weighing 220-300 g. 
Steinberg SE, et.al. Cancer Res. 1982 Apr;42(4):1279- 
82. 

Female Sprague-Dawley rats weighing 175 to 250 g. 

Methyl orange 

55.00 

O'reilly W.J., Pitt P.A. et al, Br. J. Pharmac (1971), 43, 
167-179. 

Methylphenyldipropylammoniu 
m iodide 

17.00 

Hughes R.D., Millburn P., et al, Biochem. J. (1973)136, 
967-78. 

Mitoxantrone 

6.08 

Yang XN, Morris ME. J OF PHARM SCI, vol 99 (5) 
Pages: 2502-2510, May 2010. Male Sprague-Dawley 
(SD) rats (300-430 g). 

Morphine 

9.03 

Roerig DL, et.al. Biochem Pharmacol. 1974 Apr 

15;23(8): 1331-9. Sprague-Dawley male rats (300 - 
400g). % of dose (in total): 49.3 ± 3.6. 

Peterson RE, et.al. J Pharmacol Exp Ther. 1973 
184(2):409-18. Male SD rats (325-450 g). % of dose (in 
total): 63. Parent drug accounted for 17.0 ± 2.3% of 
that. 

Smith DS, et.al. Biochem Pharmacol. 1973 Feb 
15;22(4):485-92. Male SD rats (350-450 g). % of dose 
(in total): 64 ± 5. Parent drug accounted for 10% of 
that. 

Moxalactam (latamoxef) 

20.50 

Uchida K, et.al. J Pharmacobiodyn. 1985 
Nov;8(ll):981-8 

Wistar strain male rats, 8 weeks of age. 
Mizojiri K, et.al. Antimicrob Agents Chemother. 1987 
Aug;31(8):1169-76. Male Sprague-Dawley rats 
(weight, 250 to 320 g) 

MX-68 

84.00 

Han YH, et.al. J Pharmacol Exp Ther. 1999 
Oct;291(l):204-12. Male Sprague-Dawley rats (SDRs) 
weighing 250 to 300g. 

N2-methyl-9- 

hydroxyolivacinium 

2.20 

Maftouh M, et.al. Xenobiotica. 1983 May;13(5):303-10 
Male SD rats (300 - 350g). 

Nafenopin 

4.00 

Jedlitschky G, et.al. Biochem Pharmacol. 1994 Sep 
15;48(6):1113-20. % of Dose (in total): 40; Parent drug 
accounted for 10% of that. 

Naftopidil 

6.60 

Niebch G, et.al. Arzneimittelforschung. 1991 
Oct;41(10):1027-32. Male Sprague-Dawley rats (150- 

200g) 

NAPAP 

37.90 

Hauptmann J, et.al. Biomed Biochim Acta. 
1987;46(6):445-53.Wistar Rats of both sexes, body 
weight 260-340g. 

Hauptmann J, et.al. Pharmazie. 1991 Jan;46(l):57-8 

Napsagatran 

61.00 

Lave T, et.al. J Pharm Pharmacol. 1999 Jan;51(l):85-91 
Male rats (230 ± 290 g), SPF, RoRo albino 

Nelfinavir 

0.05 

Kageyama M, et.al. Biol Pharm Bull. 2005 
Feb;28(2):316-22. Male Wistar rats (280-320 g). 
Amount excreted into bile: 0.359 ± 0.027 pig. Dose: 2.5 
mg/kg. 

Nitrofurantoin 

5.16 

Wang X, et.al. Drug Metab Dispos. 2007 
Feb;35(2):268-74. Female SD rats (220g) 

N-Methylpyridinium iodide 

0.80 

Hughes R.D., Millburn P., et al, Biochem. J. (1973)136, 
967-78 

Octreotide 

50.00 

Yamada T, et.al. Biol Pharm Bull. 1998 Aug;21(8):874- 
8 
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Compounds 

BE% 

Reference 



Male Sprague-Dawley rats weighing approximately 
220g.; Yamada T, et.al. J Pharmacol Exp Ther. 1996 
Dec;279(3):1357-64.; Male SDR (approximately 220g). 
CLtot: 10.53 ± 0.38 ml/minl/kg. CLbiliary: 4.15 ± 0.21 
ml/min/kg.; 

Yamada T, et.al. Dmg Metab Dispos. 1997 
May;25(5):536-43. Male SDRs weighing 220g. CLtot: 
12.63 ± 0.56 ml/minl/kg. CLbiliary: 7.44 ± 0.29 
ml/min/kg. 

Lemaire M, et.al. Drug Metab Dispos. 1989 Nov- 
Dec;17(6):699-703. 

Orthanilic acid 

0.00 

Abou-el-makarem M.M, Millburn P., et al Biochem. 
J.(1967)105, 1269 

Paclitaxel (taxol) 

11.62 

Monsarrat B, et.al. J Natl Cancer Inst Monogr. 
1993;(15):39-46. Sprague-Dawley rats. 

Monsarrat B, et.al. Dmg Metab Dispos. 1990 Nov- 
Dec;18(6):895-901. 

Luo G, Johnson S, et al, Dmg Metab Dispos. J. 
(2010)38, 422-430 

PAEB (procaine amid 

ethobromide) - not in other 
tables 

32.20 

Watkins JB 3rd, et.al. Drug Metab Dispos. 1987 Mar- 
Apr; 15(2): 177-83. Male Sprague-Dawley rats. 
Alterations in biliary excretory function by 
streptozotocin-induced diabetes 

Pancuronium 

3.50 

Upton RA, et.al. Anesth Analg. 1982 Apr;61(4):313-6 
Male Sprague-Dawley rats, weighting 250-350g. 

Paraquat di-iodide 

0.50 

Hughes R.D., Millburn P., et al, Biochem. J. (1973)136, 
979-984 

Pefloxacin 

3.94 

Montay G, et.al. Antimicrob Agents Chemother. 1984 
Apr;25(4):463-72. Male Wistar rats (200 to 300g) 

Penicillin G (benzylpenicillin) 

20.78 

Tsuji A, et.al. J Pharm Sci. 1983 Nov;72(ll):1239-52. 
Male Wistar rats (240g). 

Ito K, et.al. Am J Physiol Gastrointest Liver Physiol. 
2004 287(l):G42-9. Male SD rats weighing 240-300g. 
% of dose (in total): 31.7; Parent drug accounted for 
50% of that. 

Penicillin V 

29.50 

Tsuji A, et.al. J Pharm Sci. 1983 Nov;72(l l):1239-52. 
Male Wistar rats (240g). 

Phenolphthalein 

2.00 

Millburn P,et al, Biochem. J. 1967; 105, 1275 
Female Wistar albino rats (weighing 200 ± lOg.) 

Phenolphthalein disulphate 

74.00 

Hirom PC, et.al. Biochem J. 1972 Oct; 129(5): 1071-7. 
Female Wistar albino rats (180-350g body wt.) 

Phenolphthalein glucuronide 

14.10 

Itagaki S, et.al. Drug Metab Pharmacinet. 
2003;18(4):238-44. Male SD rats (300 -350g). Amount 
excreted into bile in 1 hr: 311 ± 23.4 nmol/kg. Dose: 
2.2 pmol/kg. 

Phenolsulfonephthalein (PSP, 
Phenol Red) 

14.10 

Itagaki S, et.al. Drug Metab Pharmacinet. 
2003; 18(4):238-44. Male SD rats (300 -350g). Amount 
excreted into bile in 1 hr: 311 ± 23.4 nmol/kg. Dose: 
2.2 pmol/kg. 

Phenytoin (Diphenylhydantoin) 

0.40 

Inaba T, et.al. Drug Metab Dispos. 1975 Mar- 
Apr;3(2):69-73. Wistar rats (250-330 g). % of Dose (in 
total): 28 or 54, Parent drug accounted for about 0.3 - 
1.1% of that. 

El-Hawari AM, et.al. J Pharmacol Exp Ther. 1977 
Apr;201(l):14-25. Male SD rats (180-280 g). % of 
Dose (in total) in 2 hr: 32, Parent drug accounted for 
1.9 ±0.2% of that. 

PhlP 

3.09 

Dietrich CG, et.al. Carcinogenesis. 2001 


272 






Compounds 

BE% 

Reference 



May;22(5):805-ll 

Female wistar rats (200 - 250g). 

Pipecuronium 

4.48 

Bodrogi L, et.al. Arzneimittelforschung. 

1980;30(2a):366-70.Female rats weighing 200 to 320g. 
% of Dose (in total): 6.36; Parent drug accounted for 69 
- 72% of that. 

Pitavastatin 

76.15 

Hirano M, et.al. Mol Pharmacol. 2005 Sep;68(3):800-7 
Male Sprague-Dawley rats weighing approximately 250 
to 300g. 

Fujino H, et.al. Drug Metab Pharmacinet. 
2002; 17(5):449-56. Male Sprague-Dawley rats 
weighing approximately 250g 

Pravastatin 

76.15 

Akashi M, et.al. Hepatol Res. 2006 Feb 11,193-198 

Male Sprague-Dawley rats weighting approximately 

270g " .' 

Fukumura S, et.al. PharmRes. 1998 Jan;15(l):72-6 

Male Sprague-Dawley rats (SDR) approximately 270g 
Marumo T, et.al. J Gastroenterol. 2004 Oct;39(10):981- 
7. 

Male Sprague-Dawley rats weighting approximately 
270g Sasaki M, et.al. Mol 

Pharmacol. 2004 Sep;66(3):450-9 
male Sprague-Dawley rats weighing approximately 240 
to 260g. 

Takikawa H, et.al. J Gastroenterol Hepatol. 1998 

Apr; 13(4):427-32. Male Sprague-Dawley rats 
weighting approximately 270g. 

Ohashi M, et.al. Pharmacology. 2002 Sep;66( 1):31-5. 
Ogasawara T, et.al. Hepatol Res. 2001 Jun;20(2):221- 
231 

Male Sprague-Dawley rats weighing approximately 
270g Niinuma K, Kato Y, et 

al, Am J Physiol. 1999 ;276(5 Pt 1)1153-1164. 
Fukuda H, Ohashi R, et al.Drug Metab Dispos. 2008 
Jul;36(7): 1275-82 

Probenecid 

13.62 

Conway W, et.al. J Pharm Sci. 1974 Oct;63(10): 1551-4 
Male SD rats weighting 420- 530g. 

Guarino AM, et.al. J Pharmacol Exp Ther. 1968 
Dec;164(2):387-95. Male Sprague-Dawley rats, 
weighing 250 to 320g. % of Dose (in total): 85.5 ± 2.7, 
57.9 ± 4.0, 25.4 ± 3.4. Parent drug accounted for 
16.2%, 37.7% and 34.6% of that. 

Prostacyclin (PGI 2) 

0.00 

Taylor BM, et.al. J Pharmacol Exp Ther. 1980 
Jul;214(l):24-30 Female SD rats (200 - 250g) 

Proxicromil 

4.40 

Smith DA, et.al. Eur J Drug Metab Pharmacokinet. 
1983 8(3):225-32. CRCD rats. Amount excreted into 
bile: 110 pg. Dose: 10 mg/kg. Weight assumed to be 
250 g. 

PSC 833(Valspodar) 

0.86 

Song S, et.al. Drug Metab Dispos. 1998 
Nov;26(ll):l 128-33. Female Sprague-Dawley rats (10 
weeks of age, weighing 220-270g) 

QMPB 

0.00 

Christensen A, et.al. Xenobiotica. 1990 Apr;20(4):417- 
34 

female Sprague-Dawley rat, body wt 200g 

Ramatroban 

16.00 

Moriwaki T, et.al. Pharm Res. 2004 Jun;21(6): 1055-64 
SDR weighing 200-220g. % of dose (in total): 28.5 ± 
2.6, Parent drug accounted for 56% of that. 

R-benoxaprofen 

0.70 

Mohri K, et.al. Pharm Res. 2005 Jan;22(l):79-85 
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Compounds 

BE% 

Reference 



Male SD rats (250 - 300g) 

R-carprofen 

9.84 

Kemmerer JM, et.al.J Pharm Sci. 1979 

Oct;68( 10): 1274-80. Male rats (200-300g) 

Remikiren 

34.60 

Coassolo P, et.al. Xenobiotica. 1996 Mar;26(3):333-45 
Male albino SPF rats (weight 280-320 g) 

Reproterol 

4.09 

Kucharczyk N, et.al. Arzneimittelforschung. 

1981 ;31 (12):2085-8. Male Charles River rats (165- 
275g). 

% of Dose (in total): 45.33 ± 4.62; Parent drug 
accounted for 1.7 to 13% of that. 

R-grepafloxacin 

4.43 

Sasabe H, et.al. Biopharm Drug Dispos. 1999. 
Apr;20(3):151-8. Male SD rats weighing approximately 
250-300g 

Rhodamine 123 

3.72 

Kageyama M, et.al. Biol Pharm Bull. 2006 

Apr;29(4):779-84.Male Wistar Rats, 300 ± 20g. 
Amount excreted into bile over 2 hr: 2.23 ± 0.06 pg. 
Dose: 0.2 mg/kg. 

Kageyama M, et.al. Biol Pharm Bull. 2005 

Feb;28(2):316-22. Male Wistar rats: 280 -320g. 
Amount excreted into bile: 2.79 ± 0.37 pg. Dose: 0.2 
mg/kg. 

Yumoto R, et.al. Drug Metab Dispos. 2001 
Feb;29(2):145-51. Male Wistar rats weighing 230 to 
300g. 

Kageyama M, et.al. Biol Pharm Bull. 2005 

Jan;28(l):130-7. Male Wistar: 300 ± 20g. Amount 
excreted into bile over 2 hr: ~ 2000ng. Dose: 0.2 
mg/kg. 

Ritonavir 

3.40 

Denissen JF, et.al. Drug Metab Dispos. 
1997.Apr;25(4):489-501. SD rats (220-270g). % of 
Dose (in total) in 6 hr: 79.7; Parent drug accounted for 
1.9% of that (Male). % of Dose (in total) in 6 hr: 41.6; 
Parent drug accounted for 12.7% of that (Female). 

Rivaroxaban 

48.40 

Weinz C, Schwarz T, Kubitza D. et al. (2009). Drug 
Metab Dispos. 2009;37(5): 1056-64. 

Rosuvastatin 

56.90 

Kitamura S, et.al. Drag Metab Dispos. 2008 
Oct;36(10):2014-23. Male Sprague-Dawley rats (9 
weeks old). 

H.Fukuda, R.Ohashi, et a 1, Drug Metab Dispos. 2008 
Jul;36(7):1275-82. Male Sprague-Dawley rats (Charles 
River Japan, Yokohama, Japan) weighing 200 to 250 g 

Salicylic acid 

4.40 

H.Fukuda, R.Ohashi, et al,Drug Metab Dispos. 2008 
Jul;36(7):1275-82. Male Sprague-Dawley rats (Charles 
River Japan, Yokohama, Japan) weighing 200 to 250 g 

SB-265123 

2.80 

WARD K, et.al. Drug Metab Dispos. 1999 
Nov;27(ll):1232-41. male Sprague-Dawley rats 
weighing 290 to 350 g 

S-benoxaprofen 

3.00 

Mohri K, et.al. Pharm Res. 2005 Jan;22(l):79-85 
Male SD rats (250 - 300g) 

S-carprofen 

5.70 

Kemmerer JM, et.al.J Pharm Sci. 1979 

Oct;68( 10): 1274-80. Male rats (200-300g) 

S-grepafloxacin 

3.66 

Sasabe H, et.al. Biopharm Drug Dispos. 1999 
Apr;20(3):151-8. Male SD rats weighing approximately 
250-300g 

Sitagliptin 

16.39 

Beconi MG, et.al. Drug Metab Dispos. 2007 
Apr;35(4):525-32. Male SD rats (360 - 450g). 

SK&F 110679 

53.10 

Davis CB, et.al. Drug Metab Dispos. 1994 Jan- 
Feb;22(l):90-8. Male Sprague-Dawley rats. 
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Compounds 

BE% 

Reference 

SN-38 carboxylate 

6.72 

ltagaki S, Sasaki K, et al, J Pharm Pharm Sci. 
(2004)23;7(1), 8-13. Male Wistar rats, aged 6 to 7 
weeks (180-230 g in weight). 

SN-38 lactone 

2.43 

ltagaki S, Sasaki K, et al, J Pharm Pharm Sci. 
(2004)23;7(1), 8-13. Male Wistar rats, aged 6 to 7 
weeks (180-230 g in weight). 

SN-38-glucuronide carboxylate 

7.00 

ltagaki S, Sasaki K, et al, J Pharm Pharm Sci. 
(2004)23;7(1), 8-13. Male Wistar rats, aged 6 to 7 
weeks (180-230 g in weight). 

SN-38-glucuronide lactone 

21.90 

ltagaki S, Sasaki K, et al, J Pharm Pharm Sci. 
(2004)23;7(1), 8-13. Male Wistar rats, aged 6 to 7 
weeks (180-230 g in weight). 

Stilboestrol 

2.92 

Millburn P,et al, Biochem. J. 1967; 105, 1275 
Female Wistar albino rats (weighing 200 ± lOg.) % of 
Dose (in total): 94; Parent drug accounted for 3% of 
that. 

Stilboestrol glucuronide 

89.00 

Millburn P,et al, Biochem. J. 1967; 105, 1275 
Female Wistar albino rats (weighing 200 ± 10g.)% of 
Dose (in total): 100; Parent drug accounted for 89% of 
that. 

Succinylsulphathiazole 

33.00 

Hirom P.C., Millburn P„ et al, Biochem. J. (1972)129, 
1071-1077, Female Wistar albino rats (180-350g body 
wt.) 

Sulfaethidole 

18.50 

Kekki M, et.al. J Pharmacokinet Biopharm. 1982 
Feb;10(l):27-51. Male Sprague-Dawley rats weighting 
356 ± 12 g 

Sulphanilic acid 

0.69 

McMahon KA, et.al. Food Cosmet Toxicol. 1969 
Sep;7(5):497-500. Rats (250-350 g body weight 
M.M.Abou-el-makarem, P.millburn, et al Biochem. 
J.(1967)105, 1269 

Tartrazine 

19.11 

Hirom P.C., Millburn P„ et al, Biochem. J. (1972)129, 
1071-1077, Female Wistar albino rats (180-350g body 
wt.) 

Gregson RH, et.al. J Pharm Pharmacol. 1972 
Jan;24(l):20-4. Male and female Wistar albino rats, 
190-210g Bertagni 

P, et.al. J Pharm Pharmacol. 1972. 24(8):620-4 

Taurocholate 

96.00 

Akashi M, et.al. Hepatol Res. 2006 Feb 11,193-198 
Male Sprague-Dawley rats weighting approximately 
270g 

Takikawa H, et.al. Hepatology. 1996 Mar;23(3):607-13. 
Male Sprague-Dawley rats (SDR) approximately 270g. 
Fukumura S, et.al. Pharm Res. 1998 Jan;15(l):72-6 
Male Sprague-Dawley rats (SDR) approximately 270g. 
Kuipers F, et.al. J Clin Invest. 1988 May;81(5):1593-9 
Wistar rats 

Jansen PL, et.al. Hepatology. 1987 Jan-Feb;7(l):71-6. 
Homozygous TM rats (200 to 250g) 

Bowmer CJ, et.al. Br J Pharmacol. 1984 
Nov;83(3):773-82 

Male Wistar albino rats (250-350g) 

Bode KA, et.al. Biochem Pharmacol. 2002 Jul 
l;64(l):151-8 

Male Wistar rats weighing about 180 - 220g. 
Meijer DK, et.al. Drug Metab Dispos. 1976 Jan- 
Feb;4(l):l-7. Male Wistar rats weighing about 275g. 
Watkins JB, et.al. Drug Metab Dispos. 1987 Mar- 
Apr;15(2):177-83. Male Sprague-Dawley rats. 
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Reference 

Telithromycin 

13.80 

Yamaguchi S, et.al. Antimicrob Agents Chemother. 
2006 Jan;50(l):80-7. Male SD rats, 270-280g. CLsys: 
6.97 ± 0.22 L/hr/kg. CLbiliary: 4.41 ± 0.21 ml/min. 

Temazepam 

0.50 

Tse FL, et.al. J Pharm Sci. 1983 Mar;72(3):311-2 
Male Wistar strain rats average weight 250g 

Temocaprilat 

67.16 

Takikawa H, et.al. Hepatol Res. 2002 Oct;24(2):136 
Male Sprague-Dawley rats (270 g) 

Ishizuka H, et.al. J Pharmacol Exp Ther. 1997 
Mar;280(3):1304-ll. Male Sprague-Dawley rats (7 
weeks old). Ishizuka H, et.al. J Pharmacol Exp Ther. 
1999 Sep;290(3): 1324-30. Male Sprague-Dawley (SD) 
rats. 

Terbutaline 

7.88 

Eriksson H, et.al. Acta Physiol Scand. 1975 
Sep;95(l):l-5 

Mak SPF Sprague-Dawley mts, wetghing 250 -300g. 
CLtot: 5.2 ml/min/kg; CLbiliary: 0.41 ml/ming/kg 

T etraethylammonium bromide 

0.50 

Hughes R.D, Millburn P., et al, Biochem. J. (1973)136, 
967-78 

T etrahydrocannabinol 

0.07 

Widman M, et.al. Biochem Pharmacol. 1974 Apr 
1;23(7):1163-72. Sprague-Dawley rats. % of Dose (in 
total) in 6 hr: 68, Parent drug accounted for 0.1% of 
that 

Thyroxine (T4) 

3.46 

Wong H, et.al. Toxicol Sci. 2005 Apr;84(2):232-42 
Male Sprague-Dawley rats approximately 8-10 weeks 
old (~ 225-325g). 

Tolrestat 

54.75 

Cayen MN, et.al. Drug Metab Dispos. 1985 Jul- 
Aug;13(4):412-9. Male albino SD rats (200-250 g). % 
of Dose (in total): 73 in 4 hr; Parent drug accounted for 
75% of that. 

TPBE 

0.80 

Dow J, et.al. Xenobiotica. 1982 Oct;12(10):633-43 
Male Sprague-Dawley rats of approx. 150g 

TR-14035 

29.40 

Tsuda-Tsukimoto N, et.al. Pharm Res. 2006 
Nov;23(ll):2646-56. Male Sprague-Dawley rats 
weighing 250 to 320 g. 

Triamterene 

5.50 

Kau ST, et.al. Drug Metab Dispos. 1975 Sep- 
Oct;3(5):345-51. Male SD rats (200 - 250g 

T ributylmethylammonium 
(TBuMA) 

33.30 

Hong SS, et.al. Pharm Res. 2000 Jul;17(7):833-8. 
Male Sprague-Dawley rats, 7 to 8 weeks of age. 
Han YH, et.al. Drug Metab Dispos. 1999 
Aug;27(8):872-9 

Male Wistar rats (250-300g). 

Hong SS, et.al. Arch Pharm Res. 2005 Mar;28(3):330-4 
Male Sprague-Dawley rats, 7 to 8 weeks of age. 
Neef C, et.al. Naunyn Schmiedebergs Arch Pharmacol. 
1984 Dec;328(2):103-10.Male Wistar rats, weighing 
approximately 300g. Lee IK, et.al. Arch Pharm Res. 
2002 Dec;25(6):969-72.Male Sprague-Dawiey rats 
(250-270g). 

Jansen PL, et.al. Hepatology. 1987 Jan-Feb;7(l):71-6. 
Wistar rats: 200-250g. 

T riethylmethylammonium(TEM 
A) 

0.39 

Hong SS, et.al. Pharm Res. 2000 Jul;17(7):833-8. 
Male Sprague-Dawley rats, 7 to 8 weeks of age. 
Neef C, et.al. Naunyn Schmiedebergs Arch Pharmacol. 
1984 Dec;328(2):103-10.Male Wistar rats, weighing 
approximately 300g. Han YH, et.al. Drug Metab 
Dispos. 1999 Aug;27(8):872-9. Male Wistar rats (250- 
300g). 

Trifluoperazine 

0.30 

Schmalzing G, et.al. Xenobiotica. 1978 Jan;8(l):45-54 
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Compounds 

BE% 

Reference 



Male Wistar rats. 200-250g. 

Triiodothyroacetic acid 

1.05 

Rutgers M, et.al. Endocrinology. 1989 Jul;125(l):433- 
43. Male Wistar rats (approximately 200 g). % of Dose 
(in total): 42 ± 4; Parent drug accounted for less than 
2.5% of that. 

T rimethylpheny lammonium 
Iodide 

0.70 

Hughes RD, Millburn P. et al, Biochem. J. (1973)136, 
967-78 

Trimetrexate 

0.80 

Wong BK, et.al. Drug Metab Dispos. 1990 Nov- 
Dec;18(6):980-6 

Male SD rats (333 to 382g). 

UK-224,671 

28.90 

Beaumont K, et.al. Eur J Pharm Sci. 2000 
Nov;12(l):41-50 Male Sprague-Dawley rats 

UK-240,455 

23.20 

Webster R, et.al. Xenobiotica. 2003 May;33(5):541-60 
Male Sprague-Dawley rats (300-350g). 

UK-427,857 

40.00 

Walker DK, et.al. Drug Metab Dispos. 2005 
Apr;33(4):587-95 Male Sprague-Dawley rats ( 250g). 

Ulifloxacin (UFX) 

9.10 

Yagi Y, et.al. Drug Metab Pharmacokinet. 
2003;18(6):381-9 Male SD rats aged 7 weeks. 

Valsartan 

42.75 

Yamashiro W, et.al. Drug Metab Dispos. 2006 
Jul;34(7): 1247-54 Male Sprague-Dawley (SD) rats (7- 
8 weeks old). H.Fukuda, R.Ohashi, et al,Drug Metab 
Dispos. 2008 Jul;36(7): 1275-82 Male Sprague-Dawley 
rats (Charles River Japan, Yokohama, Japan) weighing 
200 to 250 g 

Vecuronium 

46.00 

Upton RA, et.al. Anesth Analg. 1982 Apr;61(4):313-6 
Male Sprague-Dawley rats, weighting 250-350g. 

V erlukast(MK-571) 

17.75 

Nicoll-Griffith DA, et.al. Drug Metab Dispos. 1995 
Oct;23(10): 1085-93 male SD rats (~ 350g) 

Vinblastine 

30.00 

Kurihara H, Sano N and Takikawa H. 2005. 20:1069- 
1074. 

Vincristine (VCR) 

42.60 

Song S, et.al. Drug Metab Dispos. 1999 Jun;27(6):689- 
94 

Female Sprague-Dawley (SD) rats weighing 220 to 
270g. 

Castle MC, et.al. Cancer Res. 1976 Oct;36(10):3684-9. 
Male and female SD rats (200 to 250g). 

Voreloxin 

35.20 

Evanchik MJ, et al. Drug Metab Dispos. 2009 
Mar;37(3):594-601. male Sprague-Dawley rats, 
weighing 225 to 275 g 

Xamoterol 

0.00 

Mulder GJ, et.al. Xenobiotica. 1987 Jan;17(l):85-92 
Male Wistar rats (body wt approx. 200g). % of Dose (in 
total): 40; No unchanged drug existed. 

YM-13115 

72.20 

Matsui H, et.al. Antimicrob Agents Chemother. 1984 
Aug;26(2):204-7 male SD rats (body weight, 200 to 
250 g) 
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6399 Appendix II. Binding data for P-gp inhibitors 


Inhibitor 

Substrate 

Cell System 

IC50 

(pM) 

Km 

(pM) 

Ki 

(pM) 

Subst 

Cone 

(pM) 

Reference 

LY335979 

Digoxin 

Caco-2 

0.02 

177 

0.023 

5 

Choo et al, 2000 

Elacridar 

Prazosin 

MDCKII- 

MDR1 



0.05 

1 

Rautio et al 2006 

LY335979 

Abacavir 

MDCK II- 

MDR1 

0.07 


0.05 


Shaik et al 2007 

Loperamide 

quinidine 

MDCK II- 

MDR1 



0.1 

3 

Lumen et al 2010 

Reserpine 

Daunomycin 

P388 

lymphoma 



0.14 

0.002 

Lan et al, 1996 

Verapamil 

vincristine 

K562-MDR 

0.2 

1.7 

0.179 

0.2 

Richter et al 2009 

Elacridar 

calcein 

MDCK II- 

MDR1 

0.3 

10 

0.273 

1 

Matsson P et al 

2009 

Elacridar 

Irinotecan 

MDCK II- 

MDR1 

0.38 

46 

0.312 

10 

Luo et al, 2002 

Elacridar 

Digoxin 

Caco-2 



0.39 

0.011 

Tang et al 2002 

Mefloquine 

Daunomycin 

P388 

lymphoma 



0.43 

0.002 

Lan et al, 1996 

Dipyridamole 

Daunomycin 

P388 

lymphoma 



0.52 

0.002 

Lan et al, 1996 

Itraconazole 

calcein 

MDCK- 

MDR1 

0.6 

3.1 

0.581 

0.1 

Cook et al, 2009 

Terfenadine 

Daunomycin 

P388 

lymphoma 



0.63 

0.002 

Lan et al, 1996 

CP 147478 

Digoxin 

Caco-2 

0.14 


0.75 

5 

Wandal et al, 1999 

Reserpine 

vinblastine 

LLC- 

PK1/MDR1 



0.97 

2 

Ekins et al, 2002 

Cyclosporine 

Prazosin 

MDCK II- 

MDR1 



0.98 

1 

Rautio et al 2006 

Verapamil 

Prazosin 

MDCK II- 

MDR1 



1.18 

1 

Rautio et al 2006 

Gallopamil 

vinblastine 

Caco-2 

1.63 

4.1 

1.308 

1 

Neuhoff et al, 

2000 

Nelfinavir 

Digoxin 

Caco-2 

1.4 

177 

1.362 

5 

Choo et al, 2000 
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Inhibitor 

Substrate 

Cell System 

IC50 

(!iM) 

Km 

(!iM) 

Ki 

(!iM) 

Subst 

Cone 

OiM) 

Reference 

Tamoxifen 

Daunomycin 

P388 

lymphoma 



1.39 

0.002 

Lan et al, 1996 

D-703 

Digoxin 

Caco-2 

1.6 

177 

1.556 

5 

Pauli-Magnus et 
al, 2000 

Pumafentrine 

calcein 

K562-MDR 

3.12 

0.3 

1.56 

0.25 

Richter et al 2009 

CP99542 

Digoxin 

Caco-2 

3.8 


1.6 

5 

Wandal et al, 1999 

Erlotinib 

vincristine 

K562-MDR 

2 

1.7 

1.787 

0.2 

Richter et al 2009 

Cyclosporin 

Digoxin 

Caco-2 



0.46 

0.011 

Noguchi et al, 

2009 

CPI 14769 

Digoxin 

Caco-2 

0.3 


2 

5 

Wandal et al, 1999 

Quinidine 

Daunomycin 

P388 

lymphoma 



2.05 

0.002 

Lan et al, 1996 

Ketoconazole 

Prazosin 

MDCKII- 

MDR1 



2.38 

1 

Rautio et al 2006 

Chlorpromazin 

e 

Daunomycin 

P388 

lymphoma 



2.41 

0.002 

Lan et al, 1996 

Bromocriptine 

calcein 

LLC- 

PK1/MDR1 



2.81 


Ekins et al, 2002 

Ketoconazole 

Digoxin 

Caco-2 

1.2 

177 

1.167 

5 

Cook et al, 2009 

CPI 17227 

Digoxin 

Caco-2 

0.07 


3 

5 

Wandal et al, 1999 

Norverapamil 

vinblastine 

Caco-2 

4.24 

4.1 

3.402 

1 

Neuhoff et al, 

2000 

Promethazine 

Daunomycin 

P388 

lymphoma 



3.45 

0.002 

Lan et al, 1996 

Itraconazole 

Digoxin 

Caco-2 

2 

385 

1.974 

5 

Cook et al, 2009 

Carvedilol 

Digoxin 

Caco-2 

4 

385 

3.949 

5 

Cook et al, 2009 

Bromocriptine 

vinblastine 

LLC- 

PK1/MDR1 



3.96 

2 

Ekins et al, 2002 

Nicardipine 

calcein 

MDCK- 

MDR1 

4.2 

3.1 

4.069 

0.1 

Cook et al, 2009 

Spironolactone 

Daunomycin 

P388 

lymphoma 



4.14 

0.002 

Lan et al, 1996 

Norgallopamil 

vinblastine 

Caco-2 

5.46 

4.1 

4.381 

1 

Neuhoff et al, 

2000 
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Inhibitor 

Substrate 

Cell System 

IC50 

(!iM) 

Km 

(ftM) 

Ki 

(!iM) 

Subst 

Cone 

OtM) 

Reference 

Mibefradil 

Digoxin 

Caco-2 

1.2 

177 

1.167 

5 

Ekins et al, 2002 

Progesterone 

Daunomycin 

P388 

lymphoma 



4.6 

0.002 

Lan et al, 1996 

Tolafentrine 

calcein 

K562-MDR 

9.46 

0.3 

4.73 

0.25 

Richter et al 2009 

Telmisartan 

Digoxin 

Caco-2 

5 

385 

4.936 

5 

Cook et al, 2009 

Amprenavir 

quinidine 

MDCKII- 

MDR1 



5 

3 

Lumen et al 2010 

Fluphenazine 

Daunomycin 

P388 

lymphoma 



5.52 

0.002 

Lan et al, 1996 

Mibefradil 

calcein 

MDCK- 

MDR1 

6 

3.1 

5.813 

0.1 

Cook et al, 2009 

CP101556 

Digoxin 

Caco-2 

0.6 


5.9 

5 

Wandal et al, 1999 

Ritonavir 

Digoxin 

Caco-2 

3.8 

177 

3.696 

5 

Choo et al, 2000 

Fentanyl 

Digoxin 

Caco-2 

6.5 

177 

6.321 

5 

Ekins et al, 2002 

Ergocryptine 

vinblastine 

LLC- 

PK1/MDR1 



6.43 

2 

Ekins et al, 2002 

Amitriptyline 

Daunomycin 

P388 

lymphoma 



7.53 

0.002 

Lan et al, 1996 

Saquinavir 

Digoxin 

Caco-2 

6.5 

177 

6.321 

5 

Choo et al, 2000 

Montelukast 

Digoxin 

Caco-2 

8 

385 

7.897 

5 

Cook et al, 2009 

Nicardipine 

Digoxin 

Caco-2 

8 

385 

7.897 

5 

Cook et al, 2009 

Verapamil 

fexofenadine 

Caco-2 

8.44 

150 

7.913 

10 

Petri et al, 2004 

Amiodarone 

calcein 

LLC- 

PK1/MDR1 



5.78 

1 

Ekins et al, 2002 

Tiapamil 

vinblastine 

Caco-2 

12 

4.1 

9.645 

1 

Neuhoff et al, 

2000 

Ivermectin 

Digoxin 

Caco-2 

10 

177 

9.725 

5 

Ekins et al, 2002 

Lovastatin 

Digoxin 

Caco-2 

10 

177 

9.725 

5 

Ekins et al, 2002 

Mitomycin C 

Digoxin 

Caco-2 

10 

177 

9.725 

5 

Ekins et al, 2002 

Procainamide 

Digoxin 

Caco-2 

10 

177 

9.725 

5 

Ekins et al, 2002 

Carvedilol 

vinblastine 

Caco-2 

13.7 

4.1 

11.017 

1 

Neuhoff et al, 

2000 
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Inhibitor 

Substrate 

Cell System 

1C50 

(liM) 

Km 

(liM) 

Ki 

(liM) 

Subst 

Cone 

(pM) 

Reference 

Desmethylazel 

astine 

Daunomycin 

LLC- 

PK1/MDR1 

11.8 

24 

11.783 

0.035 

Katoh et al, 2000 

Ergocryptine 

calcein 

LLC- 

PK1/MDR1 



12.2 

1 

Ekins et al, 2002 

CP100356 

Digoxin 

Caco-2 

0.11 


13 

5 

Wandal et al, 1999 

CP12379 

Digoxin 

Caco-2 

0.7 


13 

5 

Wandal et al, 1999 

Desethylamiod 

arone 

Digoxin 

LLC- 

PK1/MDR1 

25.2 

11 

25.143 

0.025 

Katoh et al, 2000 

Ergocristine 

vinblastine 

LLC- 

PK1/MDR1 



13.33 

2 

Ekins et al, 2002 

Nitrendipine 

Digoxin 

Caco-2 

14 

385 

13.821 

5 

Cook et al, 2009 

Ergotamine 

vinblastine 

LLC- 

PK1/MDR1 



14.25 

2 

Ekins et al, 2002 

Gemcabene 

calcein 

MDCK- 

MDR1 

15 

3.1 

14.531 

0.1 

Cook et al, 2009 

Isradipine 

Digoxin 

Caco-2 

15 

385 

14.808 

5 

Cook et al, 2009 

Verapamil 

calcein 

MDCK- 

MDR1 

30 

3.1 

29.063 

0.1 

Cook et al, 2009 

Desethylamiod 

arone 

Daunomycin 

LLC- 

PK1/MDR1 

15.4 

24 

15.378 

0.035 

Katoh et al, 2000 

Felodipine 

calcein 

MDCK- 

MDR1 

16 

3.1 

15.5 

0.1 

Cook et al, 2009 

Quinidine 

Digoxin 

MDCKII- 

MDR1 



0.1 

0.03 

Lumen et al 2010 

Ketoconazole 

calcein 

LLC- 

PK1/MDR1 



24.9 

1 

Ekins et al, 2002 

Azelastine 

Daunomycin 

LLC- 

PK1/MDR1 

16 

24 

15.977 

0.035 

Katoh et al, 2000 

PSC-833 
(V alsopodar) 

Digoxin 

Caco-2 

0.11 


16 


Wandal et al, 1999 

Carvedilol 

calcein 

MDCK- 

MDR1 

17 

3.1 

16.469 

0.1 

Cook et al, 2009 

Repaglinide 

Digoxin 

Caco-2 

17 

385 

16.782 

5 

Cook et al, 2009 

Troglitazone 

calcein 

MDCK- 

MDR1 

19 

3.1 

18.406 

0.1 

Cook et al, 2009 


281 









Inhibitor 

Substrate 

Cell System 

IC50 

(liM) 

Km 

(liM) 

Ki 

(liM) 

Subst 

Cone 

OiM) 

Reference 

Amiodarone 

Digoxin 

LLC- 

PK1/MDR1 

5.48 

11 

5.431 

0.1 

Katoh et al, 2000 

Azithromycin 

Digoxin 

Caco-2 

21.8 

177 

21.201 

5 

Ebrel et al, 2007 

Conivaptan 

calcein 

MDCK- 

MDR1 

22 

3.1 

21.313 

0.1 

Cook et al, 2009 

Vinblastine 

Prazosin 

MDCKII- 

MDR1 



21.9 

1 

Rautio et al 2006 

Amiodarone 

Daunomycin 

LLC- 

PK1/MDR1 

22.5 

24 

22.467 

0.035 

Katoh et al, 2000 

CP69042 

Digoxin 

Caco-2 

2.3 


23 

5 

Wandal et al, 1999 

Loperamide 

calcein 

MDCK II- 

MDR1 

26 

10 

23.636 

1 

Matsson P et al 

2009 

MK571 

calcein 

MDCK II- 

MDR1 

26 

10 

23.636 

1 

Matsson P et al 

2009 

Miconazole 

vinblastine 

LLC- 

PK1/MDR1 



26.36 

2 

Ekins et al, 2002 

Felodipine 

Digoxin 

Caco-2 

29 

385 

28.628 

5 

Cook et al, 2009 

Diltiazem 

calcein 

MDCK- 

MDR1 

30 

3.1 

29.063 

0.1 

Cook et al, 2009 

Clotrimazole 

vinblastine 

LLC- 

PK1/MDR1 



29.92 

2 

Ekins et al, 2002 

Isradipine 

calcein 

MDCK- 

MDR1 

31 

3.1 

30.031 

0.1 

Cook et al, 2009 

Troglitazone 

Digoxin 

Caco-2 

31 

385 

30.603 

5 

Cook et al, 2009 

Dipyridamole 

Digoxin 

LLC- 

PK1/MDR1 

40 

11 

40 


Kakumoto 2002 

Ranolazine 

calcein 

MDCK- 

MDR1 

34 

3.1 

32.938 

0.1 

Cook et al, 2009 

Clarithromycin 

Digoxin 

Caco-2 

4.1 

177 

3.987 

5 

Ebrel et al,2007 

Ritonavir 

calcein 

MDCK- 

MDR1 

36 

3.1 

34.875 

0.1 

Cook et al, 2009 

Diltiazem 

Digoxin 

Caco-2 

36 

385 

35.538 

5 

Cook et al, 2009 

Midazolam 

calcein 

K562-MDR 

73.9 

0.3 

36.95 

0.25 

Richter et al 2009 

Erythromycin 

vinblastine 

LLC- 

PK1/MDR1 



37.79 

2 

Ekins et al, 2002 
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Inhibitor 

Substrate 

Cell System 

IC50 

(liM) 

Km 

(ftM) 

Ki 

(liM) 

Subst 

Cone 

(liM) 

Reference 

Conivaptan 

Digoxin 

Caco-2 

39 

385 

38.5 

5 

Cook et al, 2009 

Thioridazine 

calcein 

MDCKII- 

MDR1 

45 

10 

40.909 

1 

Matsson P et al 

2009 

Desmethylazel 

astine 

Digoxin 

LLC- 

PK1/MDR1 

41.8 

11 

41.705 

0.025 

Katoh et al, 2000 

Ergocristine 

calcein 

LLC- 

PK1/MDR1 



42.8 

1 

Ekins et al, 2002 

Lansoprazole 

calcein 

K562-MDR 

86.9 

0.3 

43.45 

0.25 

Richter et al 2009 

Clotrimazole 

calcein 

LLC- 

PK1/MDR1 



44 


Ekins et al, 2002 

Saquinavir 

calcein 

MDCK- 

MDR1 

46 

3.1 

44.563 

0.1 

Cook et al, 2009 

Nifedipine 

calcein 

MDCK- 

MDR1 

47 

3.1 

45.531 

0.1 

Cook et al, 2009 

Omeprazole 

calcein 

MDCK- 

MDR1 

54 

3.1 

52.313 

0.1 

Cook et al, 2009 

Talinolol 

calcein 

MDCK- 

MDR1 

48 

3.1 

46.5 

0.1 

Cook et al, 2009 

Ranolazine 

Digoxin 

Caco-2 

49 

385 

48.372 

5 

Cook et al, 2009 

Indinavir 

Prazosin 

MDCK II- 

MDR1 



50 

1 

Rautio et al 2006 

Nifedipine 

Digoxin 

Caco-2 

53 

385 

52.321 

5 

Cook et al, 2009 

Vinblastine 

Digoxin 

Caco-2 



8.92 

0.011 

Tang et al 2002 

Cortisol 

Digoxin 

Caco-2 

55 

177 

53.489 

5 

Ekins et al, 2002 

Tamoxifen 

Digoxin 

Caco-2 

55 

177 

53.489 

5 

Ekins et al, 2002 

Pantoprazole 

calcein 

K562-MDR 

108 

0.3 

54 

0.25 

Richter et al 2009 

Clarithromycin 

calcein 

MDCK- 

MDR1 

57 

3.1 

55.219 

0.1 

Cook et al, 2009 

Miconazole 

calcein 

LLC- 

PK1/MDR1 



55.5 

1 

Ekins et al, 2002 

Paroxetine 

calcein 

MDCK- 

MDR1 

61 

3.1 

59.094 

0.1 

Cook et al, 2009 

Pantoprazole 

Digoxin 

Caco-2 

69 

385 

68.115 

5 

Cook et al, 2009 

Omeprazole 

vinblastine 

Caco-2 

89 

4.1 

71.411 

1 

Neuhoff et al, 
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Inhibitor 

Substrate 

Cell System 

IC50 

(liM) 

Km 

(liM) 

Ki 

(liM) 

Subst 

Cone 

(liM) 

Reference 








2000 

Fluvastatin 

calcein 

K562-MDR 

151 

0.3 

75.5 

0.25 

Richter et al 2009 

Desmethylcarv 

edilol 

vinblastine 

Caco-2 

97.6 

4.1 

78.311 

1 

Neuhoff et al, 

2000 

Daunomycin 

Digoxin 

Caco-2 

55 

177 

53.489 

5 

Ekins et al, 2002 

Troleandomyci 

n 

vinblastine 

LLC- 

PK1/MDR1 



87.64 

2 

Ekins et al, 2002 

Imipramine 

calcein 

K562-MDR 

180 

0.3 

90 

0.25 

Richter et al 2009 

Alprenolol 

calcein 

K562-MDR 

181 

0.3 

90.5 

0.25 

Richter et al 2009 

Digoxin 

calcein 

K562-MDR 

189 

0.3 

94.5 

0.25 

Richter et al 2009 

Captopril 

calcein 

MDCK- 

MDR1 

100 

3.1 

96.875 

0.1 

Cook et al, 2009 

Cimetidine 

calcein 

MDCK- 

MDR1 

100 

3.1 

96.875 

0.1 

Cook et al, 2009 

Losartan 

calcein 

MDCK- 

MDR1 

100 

3.1 

96.875 

0.1 

Cook et al, 2009 

Milameline 

calcein 

MDCK- 

MDR1 

100 

3.1 

96.875 

0.1 

Cook et al, 2009 

Chlorzoxazone 

Digoxin 

Caco-2 

100 

177 

97.253 

5 

Ekins et al, 2002 

Colchicine 

Digoxin 

Caco-2 

100 

111 

97.253 

5 

Ekins et al, 2002 

Dcbrisoquine 

Digoxin 

Caco-2 

100 

111 

97.253 

5 

Ekins et al, 2002 

Fexofenadine 

Digoxin 

Caco-2 

100 

111 

97.253 

5 

Ekins et al, 2002 

Paclitaxel 

Digoxin 

Caco-2 

100 

111 

97.253 

5 

Ekins et al, 2002 

S- 

Mephenytoin 

Digoxin 

Caco-2 

100 

111 

97.253 

5 

Ekins et al, 2002 

Tolbutamide 

Digoxin 

Caco-2 

100 

111 

97.253 

5 

Ekins et al, 2002 

Ergotamine 

calcein 

LLC- 

PK1/MDR1 



98.9 

1 

Ekins et al, 2002 

Ergometrine 

vinblastine 

LLC- 

PK1/MDR1 



100 

2 

Ekins et al, 2002 

4- 

hydroxycarved 

iolol 

vinblastine 

Caco-2 

128 

4.1 

102.59 

1 

1 

Neuhoff et al, 

2000 
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Inhibitor 

Substrate 

Cell System 

IC50 

(liM) 

Km 

(liM) 

Ki 

(liM) 

Subst 

Cone 

OiM) 

Reference 

Ergocomine 

calcein 

LLC- 

PK1/MDR1 



105.2 

1 

Ekins et al, 2002 

Desipramine 

calcein 

K562-MDR 

221 

0.3 

110.5 

0.25 

Richter et al 2009 

Ergometrine 

calcein 

LLC- 

PK1/MDR1 



115.5 

1 

Ekins et al, 2002 

Chlorprothixen 

e 

calcein 

MDCKII- 

MDR1 

130 

10 

118.18 

2 

1 

Matsson & par 

2009 

Guanabenz 

calcein 

K562-MDR 

250 

0.3 

125 

0.25 

Richter et al 2009 

Losartan 

Digoxin 

Caco-2 

144 

385 

142.15 

4 

5 

Cook et al, 2009 

Verapamil 

Irinotecan 

MDCK II- 

MDR1 

234 

46 

191.83 

8 

10 

Luo et al, 2002 

Avasimibe 

Digoxin 

Caco-2 

200 

385 

197.43 

6 

5 

Cook et al, 2009 

Talinolol 

Digoxin 

Caco-2 

294 

385 

290.23 

1 

5 

Cook et al, 2009 

Sitagliptin 

Digoxin 

Caco-2 

300 

385 

296.15 

4 

5 

Cook et al, 2009 

Sparfloxacin 

Digoxin 

Caco-2 

300 

385 

296.15 

4 

5 

Cook et al, 2009 

Dihydroergocr 

yptine 

calcein 

LLC- 

PK1/MDR1 



360.5 

1 

Ekins et al, 2002 

Fluconazole 

vinblastine 

LLC- 

PK1/MDR1 



400 

2 

Ekins et al, 2002 

Levofloxacin 

Digoxin 

Caco-2 

500 

385 

493.59 

5 

Cook et al, 2009 

Meloxicam 

Digoxin 

Caco-2 

500 

385 

493.59 

5 

Cook et al, 2009 

Orlistat 

Digoxin 

Caco-2 

500 

385 

493.59 

5 

Cook et al, 2009 

Dihydroergocr 

istine 

calcein 

LLC- 

PK1/MDR1 



511 

1 

Ekins et al, 2002 

Etoposide 

Digoxin 

Caco-2 



294 

0.011 

Tang et al 2002 

Etoposide 

Irinotecan 

MDCK II- 

MDR1 

1185 

46 

971.48 

6 

10 

Luo et al, 2002 

Dilevalol 

vinblastine 

Caco-2 

1185 

4.1 

950.81 

1 

Neuhoff et al, 

2000 

Captopril 

Digoxin 

Caco-2 

1000 

385 

987.17 

5 

Cook et al, 2009 
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Inhibitor 

Substrate 

Cell System 

IC50 

(liM) 

Km 

(fiM) 

Ki 

(liM) 

Subst 

Cone 

(liM) 

Reference 






9 



Cimetidine 

Digoxin 

Caco-2 

1000 

385 

987.17 

9 

5 

Cook et al, 2009 

Milameline 

Digoxin 

Caco-2 

1000 

385 

987.17 

9 

5 

Cook et al, 2009 

Paroxetine 

Digoxin 

Caco-2 

1000 

385 

987.17 

9 

5 

Cook et al, 2009 

Dihydroergota 

mine 

calcein 

LLC- 

PK1/MDR1 



1000 

1 

Ekins et al, 2002 

Fluconazole 

calcein 

LLC- 

PK1/MDR1 



1000 

1 

Ekins et al, 2002 

Diacetolol 

vinblastine 

Caco-2 

3520 

4.1 

2824.3 

48 

1 

Neuhoff et al, 

2000 

Cyclosporin 

Daunomycin 

P388 

lymphoma 



0.038 

0.002 

Lan et al, 1996 

Quinidine 

quinidine 

MDCKII- 

MDR1 



0.1 

3 

Lumen et al 2010 

Norverapamil 

Digoxin 

Caco-2 

0.3 

177 

0.292 

5 

Pauli-Magnus et 
al, 2000 

Propafenone 

Daunomycin 

P388 

lymphoma 



0.44 

0.002 

Lan et al, 1996 

Verapamil 

Daunomycin 

P388 

lymphoma 



0.69 

0.002 

Lan et al, 1996 

Verapamil 

Digoxin 

Caco-2 

1.1 

177 

1.07 

5 

Pauli-Magnus et 
al, 2000 

Verapamil 

vinblastine 

Caco-2 

1.48 

4.1 

1.188 

1 

Neuhoff et al, 

2000 

Reserpine 

Digoxin 

Caco-2 



1.38 

0.011 

Tang et al 2002 

Telithromycin 

Digoxin 

Caco-2 

1.8 

177 

1.751 

5 

Ebrel et al, 2007 

Loperamide 

Digoxin 

Caco-2 

2.7 

177 

2.626 

5 

Ekins et al, 2002 

Trifluoperazin 

e 

Daunomycin 

P388 

lymphoma 



3.8 

0.002 

Lan et al, 1996 

Sufentanil 

Digoxin 

Caco-2 

4.2 

177 

4.085 

5 

Ekins et al, 2002 

Cyclosporine 

calcein 

LLC- 

PK1/MDR1 



4.66 

1 

Ekins et al, 2002 
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Inhibitor 

Substrate 

Cell System 

IC50 

(liM) 

Km 

(liM) 

Ki 

(liM) 

Subst 

Cone 

OtM) 

Reference 

Diltiazem 

Daunomycin 

P388 

lymphoma 



5.41 

0.002 

Lan et al, 1996 

Telmisartan 

calcein 

MDCK- 

MDR1 

6 

3.1 

5.813 

0.1 

Cook et al, 2009 

Fluoxetine 

Digoxin 

Caco-2 

10 

177 

9.725 

5 

Ekins et al, 2002 

Terfenadine 

Digoxin 

Caco-2 

10 

177 

9.725 

5 

Ekins et al, 2002 

Quinidine 

calcein 

MDCK- 

MDR1 

11 

3.1 

10.656 

0.1 

Cook et al, 2009 

Reserpine 

calcein 

LLC- 

PK1/MDR1 



12.2 

1 

Ekins et al, 2002 

Quinidine 

Prazosin 

MDCKII- 

MDR1 



14 

1 

Rautio et al 2006 

Roxithromycin 

Digoxin 

Caco-2 

15.4 

177 

14.977 

5 

Eberl et al, 2007 

Dihydroergocr 

istine 

vinblastine 

LLC- 

PK1/MDR1 



16 

2 

Ekins et al, 2002 

Dihydroergocr 

yptine 

vinblastine 

LLC- 

PK1/MDR1 



19.82 

2 

Ekins et al, 2002 

Erythromycin 

Digoxin 

Caco-2 

22.7 

177 

22.076 

5 

Eberl et al, 2007 

Ergocomine 

vinblastine 

LLC- 

PK1/MDR1 



24.5 

2 

Ekins et al, 2002 

Testosterone 

calcein 

K562-MDR 

56.4 

0.3 

28.2 

0.25 

Richter et al 2009 

Azelastine 

Digoxin 

LLC- 

PK1/MDR1 

30 

11 

29.932 

0.025 

Katoh et al, 2000 

Haloperidol 

calcein 

MDCK II- 

MDR1 

39 

10 

35.455 

1 

Matsson P et al 

2009 

Nitrendipine 

calcein 

MDCK- 

MDR1 

41 

3.1 

39.719 

0.1 

Cook et al, 2009 

Indinavir 

Digoxin 

Caco-2 

44 

177 

42.791 

5 

Choo et al, 2000 

Midazolam 

Digoxin 

Caco-2 

55 

177 

53.489 

5 

Ekins et al, 2002 

Citalopram 

Digoxin 

Caco-2 

58 

385 

57.256 

5 

Cook et al, 2009 

Vincristine 

Digoxin 

Caco-2 



71.1 

0.011 

Tang et al 2002 

Omeprazole 

Digoxin 

Caco-2 

85 

385 

83.91 

5 

Cook et al, 2009 

Avasimibe 

calcein 

MDCK- 

MDR1 

100 

3.1 

96.875 

0.1 

Cook et al, 2009 


287 







Inhibitor 

Substrate 

Cell System 

IC50 

(!iM) 

Km 

(fiM) 

Ki 

(!iM) 

Subst 

Cone 

OiM) 

Reference 

Caffeine 

Digoxin 

Caco-2 

100 

177 

97.253 

5 

Ekins et al, 2002 

Morphine 

Digoxin 

Caco-2 

100 

177 

97.253 

5 

Ekins et al, 2002 

Amprenavir 

Prazosin 

MDCKII- 

MDR1 



100 

1 

Rautio et al 2006 

Alfentanil 

Digoxin 

Caco-2 

112 

177 

108.92 

3 

5 

Ekins et al, 2002 

Dihydroergota 

mine 

vinblastine 

LLC- 

PK1/MDR1 



119.4 

2 

Ekins et al, 2002 

5- 

hydroxycarved 

iolol 

vinblastine 

Caco-2 

188 

4.1 

151.08 

7 

1 

Neuhoff et al, 

2000 

Erythromycin 

calcein 

LLC- 

PK1/MDR1 

1000 

0.5 

333.33 

3 

1 

Ekins et al, 2002 

Troleandomyci 

n 

calcein 

Caco-2 



483.3 

1 

Ekins et al, 2002 

Gemcabene 

Digoxin 

Caco-2 

1000 

385 

987.17 

9 

5 

Cook et al, 2009 

Labetalol 

vinblastine 

Caco-2 

2194 

4.1 

1760.4 

03 

1 

Neuhoff et al, 

2000 


6400 

6401 

6402 

6403 

6404 

6405 

6406 

6407 
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6408 Appendix III. Inhibitory effect on OATP1B1, OATP1B3 and OATP2B1 mediated 

6409 transport and molecular descriptors of the 225 investigated compounds 



OATP1B1 

OATP1B3 

OATP2B1 

Compound 

Inbib % 

Inbib % 

Inhib % 

Atazanavir 

95 

92.6 

90.1 

Atorvastatin 

96 

74.4 

98.3 

Bromosulfalein 

94.1 

74.9 

69.3 

Cholecystokinin 

89.3 

96.9 

60.1 

Dipyridamole 

91.9 

91.8 

83.1 

Fluo-3 

93.9 

88.5 

65.9 

Fluvastatin 

80.1 

76 

98.2 

Glycochenodeoxycholate 

80.2 

72 

65.3 

Glycodeoxycholate 

85 

79 

53.8 

Indocyanine green 

86.1 

100.1 

79.2 

Lopinavir 

85.8 

84.6 

86.4 

Mifepristone 

81.2 

80.3 

70.7 

MK-571 

88.5 

80.7 

75.2 

Morin 

85.2 

78.5 

71.3 

Novobiocin 

58.1 

84.2 

85.6 

Pitavastatin 

97.4 

94.3 

63.7 

Rifamycin 

95.4 

101.4 

74.8 

Ritonavir 

92.3 

85.1 

93.6 

Rosuvastatin 

71.4 

55.4 

51.2 

Silymarin 

94.5 

88.5 

74 

Sulfasalazine 

92.1 

72.1 

92.3 

Taurocholate 

93 

95.5 

96.2 

Taurodeoxycholate 

81.7 

61.4 

85.3 

Taurolithocholate 

97.8 

89.8 

94.6 

Telmisartan 

109.4 

91.5 

94.9 

Tipranavir 

89.5 

109 

99 

5-C arboxyfluorescein 
diacetate 

70.7 

71.1 

-24.9 

Benzbromarone 

86.6 

19.5 

76.3 

Budesonide 

73.2 

79 

30.3 

Cerivastatin 

73.6 

68.1 

40.1 

Clarithromycin 

73.1 

53.8 

5.1 

Cyclosporin 

96.8 

103.7 

14 

Diazepam 

51.6 

9.3 

51.4 

Diethylstilbestrol 

62.1 

31.1 

68.1 

Estradiol-17-P-glucuronide 

67.9 

69 

-12.2 

Genistein 

84.9 

-5 

67.9 

GF120918 (Elacridar) 

94.4 

37.4 

67.2 

Glibenclamide 

92.4 

49.3 

77.4 

Glycyrrhizic acid 

65.8 

90.9 

28.7 
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OATP1B1 

OATP1B3 

OATP2B1 

Ivermectin 

64.7 

55.2 

39 

KOI 43 

59.3 

24.2 

78.8 

Nefazodone 

12 

61.4 

61.5 

Nelfmavir 

71.3 

59.3 

50 

Nystatin 

69.3 

74.9 

23.4 

Paclitaxel 

71.6 

62.1 

30.8 

PSC833 (Valspodar) 

96.3 

93.9 

33.1 

Quercetin 

77 

21.6 

72.6 

Repaglinide 

88.4 

83.2 

42.2 

Reserpine 

67.2 

25.4 

72.3 

Rifampicin 

88.3 

101.7 

21.2 

Taurochenodeoxycholate 

89.5 

82.4 

47.2 

Vinblastine 

57.6 

54.3 

-9.3 

17P-estradiol 

102.8 

39.3 

47.4 

Amprenavir 

79.3 

16.3 

-3.4 

Astemizole 

22.3 

24.4 

58.9 

Baicalin 

27.6 

20.9 

59.1 

Candesartan 

52.1 

28.6 

28.4 

Coumestrol 

73.2 

-21.1 

-41.5 

Diclofenac 

77.9 

27.3 

19.6 

Erlotinib 

10.6 

27.5 

93.7 

Erythromycin 

58.8 

45.8 

-19 

Estrone-3 -sulphate 

98.1 

10.1 

20.8 

Ezetimibe 

55.5 

1.5 

-8.9 

Flutamide 

6.8 

9 

65.8 

Gemfibrozil 

59.3 

14.4 

15.1 

Glycocholic acid 

66.4 

29 

-9.8 

Hoechst 33342 

48.1 

67 

12.7 

Indinavir 

71.9 

18.1 

17.8 

Indometacin 

88.6 

48.6 

-82.4 

Itraconazole 

22.3 

-2.3 

59.8 

Ketoconazole 

53.6 

24.7 

44 

Levothyroxin 

-13.1 

11.4 

63.7 

Lovastatin 

66.5 

21.3 

-17.9 

Mitoxantrone 

27.2 

67.9 

48.3 

Nicardipine 

65.1 

10.1 

31.8 

Nifedipine 

63.7 

-21 

-44.4 

N-methylnicotinamide 

71.4 

-0.5 

14.5 

Olmesartan 

60.4 

37 

-7.9 

Ouabain 

50.5 

20.5 

-24.1 

Piroxicam 

16.2 

23.5 

68.3 

Pravastatin 

52.2 

5.4 

36.8 

Progesterone 

63.9 

33.4 

-362.6 
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OATP1B1 

OATP1B3 

OATP2B1 

Quinine 

54 

-1.3 

-22.9 

Rosiglitazone 

79.1 

-32.1 

10 

Saquinavir 

63.8 

6.6 

11.5 

Simvastatin 

73.1 

29.3 

47.4 

Spironolactone 

88.3 

-34.2 

-82 

Tetracycline 

22.1 

29.5 

51.1 

Valproic acid 

25.2 

9.6 

61.9 

Valsartan 

62.2 

33.7 

12.3 

Vincristine 

3.9 

59.8 

13 

1 -methyl-4-phenyl 
pyridinium 

14.2 

10.8 

2.4 

Acarbose 

14.9 

17.7 

0.1 

Aciclovir 

2.3 

-2 

-15.2 

Allopurinol 

-35.6 

10.8 

8.4 

Amantadine 

26.4 

10 

17 

Amitriptyline 

10.5 

22 

-8.9 

Amodiaquine 

24.2 

28.2 

43.4 

Atenolol 

20.6 

7.1 

6.8 

Atomoxetine 

-7.5 

31.6 

16.3 

Berberine 

32 

19.4 

18.5 

Bestatin 

10.9 

31.8 

20.7 

Bufuralol 

-13.9 

23.8 

-7 

Bupropion 

8.5 

7.2 

17 

Buspirone 

33.3 

-5.6 

-13.9 

Caffeine 

-1.8 

22.1 

3.2 

Captopril 

18.6 

4 

39.1 

Carbamazepine 

13.3 

23.2 

17.3 

Carnitine 

36.6 

-9.3 

15.5 

Cefadroxil 

9.4 

21 

3.5 

Cefamandole 

30.2 

16.5 

2.7 

Celecoxib 

-3.3 

22.3 

-69 

Cetirizine 

35.7 

40.3 

29.9 

Chelerythrine 

-0.7 

24.9 

13.1 

Chloroquine 

-1.4 

-7.2 

23.5 

Chlorpromazine 

27.1 

22.8 

4.3 

Chlorprothixene 

24 

-6.9 

26.6 

Chlorzoxazone 

20.1 

-1 

12 

Cholic acid 

41.6 

40.8 

20.2 

Cimetidine 

40.3 

24.8 

-6.2 

Clotrimazole 

32.3 

-120.4 

-37.1 

Colchicine 

45.2 

-3.9 

24.4 

Coumarin 

7.6 

38.2 

9.3 

Daidzein 

35 

0.7 

32 

Desipramine 

25.9 

3.2 

-4.4 
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OATP1B1 

OATP1B3 

OATP2B1 

Dexamethasone 

10.7 

34.2 

-12.5 

Dextromethorphan 

36.3 

-21.9 

15.9 

Digoxin 

36 

-4.7 

32.1 

Diltiazem 

3.2 

4.1 

-9.1 

Disopyramide 

47.5 

8.6 

18.5 

Disulfiram 

8.5 

-3.8 

15.5 

Dofetilide 

1 

19.9 

18.1 

Doxazosin 

28.8 

-17.3 

45.8 

Doxorubicin 

-5.8 

26.6 

49.9 

Efavirenz 

40.9 

18.2 

43.2 

Eletriptan 

31.7 

2.8 

0.4 

Emtricitabine 

6.4 

17.1 

20.7 

Enalapril 

-7.5 

4.1 

19.7 

Etoposide 

42.2 

8.5 

6.8 

Felodipine 

36.6 

18.4 

21.4 

Fendiline 

15 

20.4 

-118.7 

Fenofibrate 

5.4 

-10.8 

34.2 

Fentanyl 

-53.3 

22.6 

11.7 

Fexofenadine 

28.9 

6 

-5.8 

Fluconazole 

7.4 

7.5 

34.1 

Fluoxetine 

21.8 

14.3 

22.6 

Flupenthixol 

30.2 

13.5 

9.5 

Fluvoxamine 

43.8 

29.2 

39 

Furafylline 

-14.7 

0.7 

18.1 

Furosemide 

23.4 

22.3 

34.5 

Glipizide 

1.1 

25.2 

22 

Glycyl proline 

-8.4 

9.6 

24.1 

Flygromycin 

32.8 

-4.2 

31.5 

Ibuprofen 

47.3 

-6.6 

13.5 

Imipramine 

26.9 

-2.4 

16.9 

Irinotecan 

40.5 

26.9 

16.3 

Isoniazid 

29.9 

18.7 

23.8 

Isradipine 

47.5 

7.7 

12 

Famotrigine 

24.9 

14.5 

13.6 

Fansoprazole 

31.4 

23.3 

-75.7 

Fisinopril 

4.6 

9 

19.2 

Foperamide 

32.5 

32.4 

19.6 

Foratadine 

43.2 

45 

28.1 

Mephenytoin 

-5.7 

-0.6 

32.9 

Metformin 

-3.3 

19.6 

-4.5 

Methotrexate 

27.4 

17.2 

-13.4 

Methoxsalen 

38.1 

8.9 

-1.6 

Metoprolol 

28.1 

19.8 

30.5 
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OATP1B1 

OATP1B3 

OATP2B1 

Midazolam 

32.9 

6.7 

22.7 

Moclobemide 

1.7 

-3.3 

27.5 

Naringenin 

34.8 

-57.2 

32.4 

Nar ingin 

34 

37.7 

47.1 

Nicotine 

6.9 

11 

-16.5 

Nitrofurantoin 

23.3 

-1.5 

23.1 

N-methylpyridinium 

27.9 

-2.3 

12.8 

N-methyl-quinidine 

-6.1 

21 

-10.3 

Nootkatone 

25.6 

-3.2 

-72.2 

Ofloxacin 

-8.2 

24.5 

18.4 

Omeprazole 

15.9 

29.5 

11.3 

Ondansetron 

4.8 

-3.6 

-8.7 

Oxaliplatin 

13 

9.9 

8.2 

P-aminohippuric acid 

30.1 

23.7 

27.4 

Pantoprazole 

15.2 

23.4 

17.3 

Paroxetine 

33 

12.7 

37.5 

Penicillin G 

8.8 

40.9 

7.6 

Phalloidin 

26.1 

36.2 

24.1 

Phenacetin 

34.2 

-0.9 

-0.1 

Phenformin 

34.5 

-2.3 

16.7 

Phenobarbital 

-3.9 

7.3 

11 

Phenylbutazone 

25.1 

17 

15.1 

Phenylethyl isothiocyanate 

-16.4 

12.3 

-32 

Phenytoin 

25.4 

20.2 

11.1 

Pilsicainide 

11.2 

-16.8 

10.8 

Pindolol 

-24.7 

24.5 

-10.8 

Pioglitazone 

21.9 

5.6 

-9.9 

Prazosin 

25.1 

36.1 

1 

Prednisolone 

2.3 

9.6 

27.6 

Probenecid 

35.2 

22 

26.4 

Procainamide 

21.6 

3.4 

17.8 

Propranolol 

8.9 

12.8 

10.1 

Quinidine 

43.3 

37.2 

-10 

Ranolazine 

4.3 

-14.9 

41.6 

Sanguinarine 

-0.2 

2.9 

15 

Sildenafil 

22.2 

30.8 

45.9 

Sotalol 

46.4 

6.6 

19 

Sulfaphenazole 

16.9 

0.7 

17.1 

Tamoxifen 

25.6 

28.5 

11 

Tenofovir 

18.4 

33.6 

23.3 

Terfenadine 

42.7 

33.3 

3.7 

Testosterone 

33.2 

37.4 

-175.1 

T etraethylammonium 

25.9 

4.9 

6.7 
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OATP1B1 

OATP1B3 

OATP2B1 

Theofylline 

0.8 

2.5 

-25.5 

Thioridazine 

26.1 

25.2 

21.5 

Thiotepa 

-40.3 

0.6 

16.2 

Ticlopidine 

19.4 

0.2 

23 

Tolbutamide 

9.1 

5.8 

15.5 

Topotecan 

36.7 

17.3 

-39.2 

Tranylcypromine 

-9.5 

-9.3 

15.5 

Triazolam 

45.7 

18.7 

-20.5 

Trimethoprim 

24.7 

18.8 

35.1 

Valaciclovir 

8.3 

6.6 

-7.3 

Varenicline 

24.7 

0.8 

27.3 

Warfarin 

27.8 

-3.2 

31.6 

Verapamil 

40.3 

-9 

-62.6 

Zidovudine 

9.5 

7.1 

4.7 


6410 

6411 

6412 
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